Instant Genius - AI’s fight to understand creativity, with Ahmed Elgammal
Episode Date: May 11, 2023Artificial intelligence has seeped into the art world, creating incredible paintings, winning art competitions, and turning amateurs into Picasso. But how does it work, and can it really replace artis...ts? We spoke to Ahmed Elgammal, a professor of computer science at Rutgers University to find out. Learn more about your ad choices. Visit podcastchoices.com/adchoices
Transcript
Discussion (0)
Study and play.
Come together on a Windows 11 PC.
And for a limited time, college students get
the best of both worlds.
Get the Unreal College deal,
everything you need to study and play with select Windows 11 PCs.
Eligible students get a year of Microsoft 365 premium
and a year of Xbox GamePass Ultimate
with a custom color Xbox wireless controller.
Learn more at Windows.com slash student offer.
While supplies last, ends June 30th,
turns at AKA.m.m.S.
college PC.
This podcast is sponsored by name, audio and focal.
Streaming has made music more accessible than ever,
but true listening is about more than ease.
It's about quality.
British audio experts name audio,
alongside French acoustic specialist focal,
combine handcrafted tradition with cutting-edge innovation and high-end materials,
delivering digital precision with analogue warmth.
So you can experience exceptional sound at home.
Music just as the artist intended.
Visit name or something.
Audio.com to learn more.
From BBC Science Focus magazine.
This is Instant Genius, a bite-sized masterclass in podcast form.
I'm Alex Hughes, staff writer at BBC Science Focus magazine.
This week, we're talking about AI art.
In the past year, we've seen the internet flooded with a new type of art, but it isn't made by humans.
Thanks to AI-powered software, anyone can create images.
based just on a worded prompt.
Want a cartoon of a cat fighting a dinosaur?
Easy.
Need to see two teddy bears in Paris in the style of a rom-com poster?
It's just one click away.
But how does this all work?
And what does it mean for the future of art and the artist creating it?
I'm joined by Ahmed Elgamal to discuss this topic.
He's the founder of the AI Art Tool, Playform,
and a professor of computer science at Rutgers University.
this episode, he explores everything from how AI art platforms are trained through to its
effect on the art world and its lack of truly creative understanding. So I imagine by now a lot of
people have tried out an AI image generator in some form or seen how they work. But what is
the, I guess, the actual process of AI creating an image when I search a prompt, what's
actually happening? I would like to start by a little bit of history behind that, how AI is
generate images. About five, six years ago, there was some advancement, AI code generate
adversarial networks gangs where you can give it some images and it can try to generate more
images similar to what you give it. So give it images of cats. It can give you images of more
cats. So that makes a revolution of using artificial intelligence in image generation and many
artists and creative take notice and start using that. However, there was some
issue of how can you control the AI generation.
So here came another generation of image generator that use text to generate images.
We can debate about whether this is a good choice or not.
We can discuss this.
But how this work.
Basically, these models are trained on lots of images and their text caption.
So it tries to understand how the text caption relate to the images.
So when you have an image of a bird on a tree,
then try to guess what is the image,
where is the tree and where is the bird.
And doing that from billions of images,
it tried to figure out which part of the image
relate to which words.
So now, after trading these models on billion of images
for hours and days and weeks,
if we give it a text,
it will first analyze the text
using the same way,
Shad GBT and other language model analyzed text
to come up with mathematical representations.
of this sentence you give it. And then it tried to correlate every word with wearing the image
that relates to based on what I've seen in all this data trained on. And it will try to generate
an image basically based on this sentence. And I think people that have seen the words come from
AIR, a lot of it looks incredible. It looks incredibly realistic. But I'm interested by what it is
that stops it achieving, I guess, more complicated shapes.
There's a lot of times where hands, there'll be an extra finger
or there'll be a nose flying out of someone's ear or something.
I'm curious why there's certain parts that it really struggles with.
Definitely.
This model still struggles with small details.
Anything that has small details, it will have hard time generating.
Because basically the way these models are trained,
as you are trained to optimize some, what's called loss function,
which is basically a criteria that it tried to optimize.
And usually this criteria is all over the image.
So I try to get most of the image correct.
So when you try to get most of the image correct,
you kind of neglect small details.
I will try to get most of the image correct,
but there are some small details that we as human are very tuned to catch,
like a hand with four fingers or a three-legged person or a face
that has a strange or unsymmetric features.
we are very good at catching that.
However, for the AI, it really doesn't know a difference between these small details
and any other small details in the background or in an unrelevant area
that we as human cannot notice.
So the current generation of these models struggle with that
because this is the way it was trained.
But I'm sure in the next few months you will come up with new models that have been trained
to take into consideration these small details that are different.
important to human, like where are the hands, the bows, the face, all these are important,
and the model can add that into the optimization. So the criteria will now capture these issues.
And we're talking a lot about the training here, which, as you mentioned, is based on, you know,
millions and millions of images. I'm curious what that's like in terms of an energy consumption
kind of situation. Is it most of the energy for these is generated in the training process and then
when you're actually doing searches, it's heavily decreasing from that point on or is it a constant
similar output? Definitely training the AI. These models takes a lot of energy. You need to run
this on GBIUs for days and weeks over billions of images and probably you have to rerun it many,
many times to optimize the process and gets its factory results.
However, even after training these models to be able to generate an image,
these models need to be running on a GPU, what's called graphical processing units,
which is a specialized piece of software that are typically used for gaming.
It has millions of, actually a thousand of processors, small processors.
And these are very energy-consuming devices.
And running this model to generate require that you run these GBUs.
And if you are serving thousands of users, you need to have multiple of these GBUs running 24 hours.
And that definitely has a significant energy consumption and environmental impacts.
And when we're doing these training processes,
I think something that OpenAI discussed quite openly when they were working in their Dali project
was the issues of, I guess, biases or incorrect.
information or a tendency to lean into certain cultures that would leak in from the information
they were collecting from. Is there a better solution to that problem? Is it a case of needing
a list of, I guess, approved content that's been looked through or that covers a base of different
cultures and beliefs? That's a very important issue. How can we control the data given into
the AI. And that's very critical because there are different opinions on the world about
everything, politics, religion, life, style of life, everything. So we cannot censor the data
that are given to the AI to weigh certain voices more or less. That's very unfair. I think
AI naturally has to reflect different opinions of world, different viewpoint in the world, different
culture, different religion, different political views. And AI has to learn on all the
that we cannot really censor that this will come with a lot of misinformation that we live in.
That's part of our life.
The same way we look at your feed in a social media platform and you can filter out or
kind of guess that this is false information or this is true information, we have to be trained
that AI can generate false information because this is how it works.
It really digest what we give it and be rendered it in its language.
so we should not take whatever AI give us as proof.
It's basically a rendering of what's out there in a new way.
It can be valid, can be misvalid.
This is how we can look at AI generation.
AI right now has no way to tell facts from fiction.
For it, everything is just words.
And once we start talking about facts, that's a big problem.
What are the facts and what are the opinions?
that's really a harder thing to do.
I think one of the, I guess, other big issues that's been raised a lot is around the rules and regulations of how this technology can be used, you know, in terms of who owns copyright, who owns the images that were put into originally.
And I guess it's the use of it when it's monetized.
It's, I guess you could say, you know, a sort of a Wild West situation right now.
Do you think this is something that laws will clamp down on it over time?
or is it kind of at its peak when it's unregulated?
The copyright issue is a new issue that comes with the current generation of image generators
that are mainly trained on billions of images.
However, this issue was not the case a couple of years ago
when artists used to use AI through certain models like GANS,
where you can actually train your own model using your own images.
So the copyright issue was not that big at the time.
The copyright issue comes now with the fact that you're trading this model on billions of images taken from the Internet without consent of the artist or the people who make the images.
And now you can regenerate these images or regenerate a bastiche of these images, a mix of these images.
And that is very problematic because in one sense it's unethical.
In the other sense is it is not violating the copyright law because what you're generating is usually.
a transformative version of the image, not a direct derivative.
So under any copyright law, usually this will not be a problem.
But it's unethical still.
So this is a big problem because now the copyright issue is a three-party problem.
You have the person who generating the image might be violating the copyright of some other
artist that he doesn't know about.
And then the third party in this is the developer of the AI, the company.
who developed the AI and trained that system based on these images, how that causes a problem.
So it's a three-way copyright problem now, which is a very new situation.
It's a big problem, and I don't think the copyright laws right now is capable of doing anything about it.
And there are some initiatives to clean up the dataset that are for training and use artist content and things like that.
but I hardly see how this will work.
We're talking about billions of images.
The latest, they said that I've been used to train these models are like 5 billion images
and probably soon going to see more than that.
There are some incentives to ignore this problem from these big companies who train these models
and from some users who use these models,
very happy about being able to generate images in the style of their favorite artist.
they cannot get code.
So it's causing a mess, basically, at this point.
However, I have to remind everybody that this is not the way it's supposed to be.
You can use AI to generate images by training it's using your own data and your own images
without having to violate things.
And even I have concern about these text prompting as a way to communicate with AI system,
is it the best way to generate an image to tell the AI what you want using text?
is what the artist wants.
Maybe consumer would like to do that.
Somebody who's not an artist,
who I'd like to generate an image,
you need to write down a text,
describe what do you want.
But an artist wouldn't do that.
Artists are a visual thinker.
Artists prefer to think visually, not through text.
The fact that you want to describe
what the art will look like
is very foreign for an artist.
That's not how artists work.
So that fact of controlling
the generation using text has many problems.
Copyright is one of them.
But as a creative process, it's also very unnatural and very restrictive,
restricting for many artists.
I see it as a consumer-facing technology,
more than artists-facing technology.
And these problems with copyright is really fundamental.
Ambition comes in all shapes and sizes.
At First Citizens Bank, we roll with your goals,
because we're built for what you're building
fit for your ambition for Citizens Bank.
You said this place was steps from the water.
We just haven't found the steps yet.
How much did we save?
Enough.
Enough to get lost.
Or you could book a stay with Hilton.
Welcome to your ocean front room.
Just steps from the water.
The Hilton sale is on now.
Book on Hilton.com or the Hilton app
and save up to 20% to get the stay you expected.
When you want savings, not surprises.
It matters where you stay.
Hilton, for the stay.
This podcast is sponsored by Name, Audio and Focal.
With over 100 years of combined expertise,
Name and Focal have been bringing music to listeners
just as the artist intended.
Since day one, this mantra has shaped every innovation
in high-fi design, technology and acoustic engineering,
balancing craftsmanship and tradition with pioneers.
engineering thinking. Name audio pushes cutting-edge technology to ensure digital precision
whilst sustaining Pratt, pace, rhythm and timing, the elusive quality that makes music feel alive
and gives it emotional texture. Today, in partnership with French acoustic specialist's
focal, name audio creates systems that deliver exceptional sound and unforgettable listening
experiences at home. Try it for yourself at a focal powered by name boutique.
Visit vocal powered by name.com for more information.
So do you think the products that are, I guess, best known currently,
they're all designed for consumers?
Is there an option or a chance to also make a product that is more for artists
and is better aimed at artists?
There are really important that are aiming at artists.
Platform AI, for example, that is a platform that we developed back in 2019
before the current generation of AI text prompting generator.
And that allows artists to train their own AI based on their own data without having to have any
copyright issue, without having to be restrictive by text prompting or thinking through a text
pipeline.
So this platform have been around.
Now, I think the new generation of text to image platform are very popular because it's more
facing consumers.
and it bring AI more easy, make it easier accessible to the masses
who can rest, write a text and generate an image.
But arts have been using AI the last five years,
even before these platforms and generating amazing things
that have been exhibited in exhibitions and museums.
And one thing, when I also mention,
that the use of AI before in making art has been welcomed
in the art market, in exhibitions, in galleries,
But now with this new generation of text to image generator, you have seen that there is a big debate about that many artists would start to ban AI images from their platform like in Reddit and Discord and others.
This issue was not there before in the last five years.
Many artists have been using AI without a problem and has been welcomed because they have been using AI based in their own images, in their own data, it's a creative process, it's a conceptual art process.
that was you have been welcomed.
But now with
when you're dealing with a big model
that I've been trained on billion of images
and you just basically,
your job as an artist
is to reverse engineer the system
to generate an image
by plugging certain keywords
to manage the system.
That's a very little creative process.
The creativity here
becomes really a reverse engineering process
of the system,
not really a visual thinking
or a natural way for artists.
We've,
I think mostly today touched on AI and its ability to make images,
but it's been proven to create, I guess, text, music,
a bunch of different forms of art.
Is there, in your eyes, a form of, I guess, digital creation
that it would struggle to create or struggle to match humans in?
I think most of the contents that we can digitize,
whether it's text, music, images, videos,
anything that we can digitized into digital format,
AI can be trained to generate such as such content.
Does that mean that AI is good at generating them?
It can be, it can be not.
Anything that require higher level of semantics,
AI might struggle.
For example, AI can create boitry.
Yes, we try tragedy with you and can generate some nice poetry,
but definitely the level of sophistication of these poems that generated by AI
is really very limited, maybe better than me and you're writing poems, but if you
read them, it's very naive, it's not really thoughtful. That's where human creativity
is still surpass AI and will for a long time, writing a novel. Yes, I can write a novel,
but it will be nothing compared to a great novel by a great writer. Because one very important
thing, AI, we have to realize, AR doesn't generate,
art. AI generate images. So when or music or notes, but not art. Making an image doesn't make
you an artist. What makes an image and art is the artist behind the scene who use AI to make
that image. That's when it's called art. So there is a human behind the scene that will use
this machine, that's this power to generate images and iterate and keep going and select and
curate from the outcome. This whole process is what art making is. The AI is just filling in one
part of that, which is a tool that can generate this content. The same thing applies when you,
if you are a creative writer, yes, you might use AI to write certain paragraphs for you, but at
the end, you are the one who have the idea in your head, and you are the one that want to create
this whole novel or this whole story. AI is going to be just a way to help you. One thing also
have to remember that what art is. Obviously, art and literature are ways to communicate between
people. At the end, the art communication between human. So AI can generate things, but AI doesn't
have a message to communicate. At the end, the message that communicates comes again from
the human behind the scene who write the novel or write the story or write the music or make the art.
And is there something that AI could learn to replicate?
Can it learn to experience that creativity and that flare?
Or is it always going to just be replicating a prompt as accurately as it can
and being used more as a tool for someone else's view of creativity?
I believe the current generation of AI is limited to creating content
that imitate human content at large
and has to be controlled largely by human to can be.
it's something useful. It can be a great tool, but it is not yet there where it can, has its
own conscious about the world. This is really a big issue. We are able to be an artist or a writer.
You have to have a conscious about what's happening in the world and you have an opinion about
what's happening in the world and your voice. And you use art and literature to communicate
these ideas. The AI doesn't have that. As long as you have a bit,
as AI does
have a conscious,
it cannot be an artist.
This is basically
as simple as that.
It can be a great tool
for artists to use.
But I cannot call
them an artist
or a musician.
A couple of years ago,
we use AI in a project
to generate
Beethoven 10th Symphony.
And by the way,
this was using
the same kind of language
models that are used
behind Shad, GBT, and others.
So we train this
AI on lots of
classical music
Breyer to Beethoven and using
Beethoven body of work.
And then the AI look at the sketches
that Beethoven left for a 10th symphony
and it tried to generate
completions of these notes
and harmonization
and orchestration. And that
project was a great example of
how human use AI
in making something interesting.
So at the end,
AI was a tool in the process.
The composer and
the team tell it what to do.
I tell it, here's a couple of melodies, complete this for me, and then they take that and fit it into the movement where it fits.
And then I trade back and go to the AI again and ask them to orchestrate certain pieces or combine certain pieces.
So it was an elaborate process of back and forth between the human and the AI until that work is done.
I see this as a great model of what's happening now in the world, basically, everything happening now in terms of using AI.
is a cycle, a creative process that mainly human is in charge,
and AI is following the rules to generate content for that human to fit into his project or her project.
In the future, do you think we're going to see AI developing in something that's, I guess,
more advanced, but less creative?
So OpenAI was working on a project called Point E, which was a method of generating
3D shapes from prompts in the same way that ChatGBT or Dali works.
Is there a room where we could start to see an AI generator that's creating full 3D
environments or entire worlds that could be explored?
Yes, and this will be really the interesting thing to see.
Because these are tasks that are very time-consuming to generate a 3-D environment in graphics
for gaming or for virtual environment, is very time-consuming to generate.
and if we have the AI power and the machine power to generate that for us based on our instruction, that will be really great.
Another example is when you're creating a video, when you create a video showing certain things,
it takes a lot of time and effort to shoot these videos and edit it and montage it and do it as human.
So any tool that can really create video clips for you showing what you want will be really great.
all these massive tasks that human do and takes a lot of time for human, I think AI will play a big
role in helping human doing that. But at the end, the creative process, the human is going to
be like the creative director. And AI tools will be more of a kind of people, it's not people,
basically more of a slave, digital slaves working for that human to, that creative director to
create whatever they want, whether it's music or art or literature.
And fortunately, the AI is not conscious, so having a digital slave in that sense is totally fine.
It's not unethical about that.
I think a good thing about that also, it kind of make this possible, make it possible for artists
who doesn't have the means to have a studio assistant or somebody to work with you in a workshop,
to have access to this technology
so they really can grow
and do what they want. In the best,
I mean, we have seen
artists like Andy Warhol and
others have big workshops of
creators working for them
and doing what they want.
But an emerging artist doesn't have
this ability. So you can use
AI as an emerging artist to really
help you create things at scale.
And a couple of years ago, we did a study
where we asked
artists who used to use
AI at a time that was like four or five years ago when AI was very limited compared to what
it is now, why be using AI what they found as a value of using AI?
And we found that there are mainly two things artists like in AI.
First thing is the fact it creates novel ideas that we didn't think about.
The AI look at the world in different eyes from our human eyes and can give them new ideas.
We like that very much.
The other thing is the volume, the AI really can create lots or lots of assets and images for them that they can use in the projects.
And this is something that takes a lot of time and they have to hire assistants sometime to do that job.
So the fact that this can be done very fast using AI is very valuable.
These are the two value propositions in using AI in the creative process.
The text can give you new ideas out of the box and that it can give you creative volume.
on lots of data.
However, looking at what's happening now
with text prompting as a way
to generate, I think
we are losing the first one. We are losing the fact that
AI has been, give us ideas
out of the box or out of ordinary, because
now AI is constrained by our
language. It looks at the word from
the lens
of our own
language. So we added a constrain
that we limit the
AI ability to be imaginative or
be generating
interesting concept visually. However, that's very useful in other contexts, because if you are
using AI to generate something linguistic, text or something very structured like music, that's
very important to have language in the process. So I think we have a long way to go in terms
of how AI can fit the creative process for different artists, and what we see now is just still
early stages of what is possible. When you look back through history, you know, we've
had a huge
row of art movements
throughout time,
cubism,
impressionism,
realism,
all these different
stages throughout history.
Is there,
I guess,
an argument
that AI is
simply the latest
art movement,
or is it
something more than that?
I think in the last
five years,
we have seen this,
and I think
ended already,
where the early
artists who have been
using AI in the last
five years,
especially using GANS,
have specific
aesthetics in their world,
because of AI has been in what's called the Caney Valley.
It carries these cany-looking images.
And if you look at most of the art
have been engineered by many artists in that period,
it has this look, it has this style,
it has this feeling and aesthetics.
I think now AI becoming more photorealistic,
very good at generating photorealistic images,
very good at generating graphics,
very good at emitting styles.
And it lost this ability to be surprising
and uncanny and have this surreal effects.
So these last five years,
some people even give it a name.
I remember some people called GANism.
It was GAN.
But I think this is gone.
It is already as a period in our history of using technology in making art.
It was a really amazing period and it's already gone.
Now it's AI is coming to become more of a tool
for everybody to generate a photorealistic image,
a graphic design, a logo.
but it lost its unique aesthetics it used to have.
Thank you for listening to this episode of Instant Genius.
That was Ahmed Elgamal talking about AI art.
The Instant Genius podcast is brought to you by the team behind BBC Science Focus magazine,
which you can find on sale now in supermarkets and newsagents,
as well as on your preferred app store.
Alternatively, you can come and find us online at sciencefocus.com.
This podcast is sponsored by Name, Audio and Focal.
The texture and emotional depth of music can be lost through digital sources or poor signal.
Name Audio believes you can have digital precision with analogue warmth.
Alongside French acoustic specialist focal,
Name creates high-end audio systems combining innovation with craftsmanship,
so you can listen to music, just as the artist intended.
Discover more at Name Audio.com.
com.
