The Current - The greatest artist of the 20th century? AI’s answer and why it matters

Starting point is 00:00:00 Ten years ago, I asked my partner Kelsey if she would marry me. I did that, despite the fact that every living member of my family who had ever been married had also gotten divorced. Forever is a Long Time is a five-part series in which I talk to those relatives about why they got divorced and why they got married. You can listen to it now on CBC's Personally. This is a CBC Podcast. Hello, I'm Matt Galloway and this is the current podcast. Seems like every day we're learning about new advancements in the world of artificial intelligence, whether it's sophisticated research tools or realistic video, AI isn't just getting better, but more and more people are actually using it,

Starting point is 00:00:48 including Nora Young. She is the CBC's senior technology reporter with the Visual Investigations Unit. Nora, good morning. Good morning. Let's start, there's a lot to talk about, so let's start with text generators. Yeah.

Starting point is 00:01:01 What are the big ones that people can actually use right now? Yeah, I mean, this is probably what most people have played with a little bit, chat GPT being the most famous of them, but there are also other big players. Microsoft has copilot, Google Gemini. There's also Claude by a company called Anthropic that a lot of people are interested in. Um, and you know, as you know, these things can hallucinate or make things up. Um, and that's because they don't really have an actual understanding of the world around them the way that you and I do. So you do need to double

Starting point is 00:01:28 check especially important info. But now the thing is that's changing is they've grown pretty good at actually giving you answers recently. They can actually look stuff up through sources online and cite those sources. They each have their differences though in terms of the answers that you get. For instance, I was monkeying around and asking the question who is the greatest artist of the 20th century? And they all gave broadly similar answers, but Google Gemini included musical artists. So that kind of points to the need to be thoughtful

Starting point is 00:01:52 about what you're asking them and the kind of prompts you're giving them. Who is the greatest artist of the 20th century? Do we know? Most people seem to say Picasso. Most people, most bots seem to say Picasso. Not Prince. Are they different?

Starting point is 00:02:04 I mean, you mentioned Claude and then there's ChatGPT. Why would somebody choose one over the other? Yeah. I mean, part of it is tone and user interface. The most recent iteration of ChatGPT has been called fawning by some people, obsequious by others. Claude has this cool feature that lets you choose and customize the style of its output, but it also kind of depends on what you use it for.

Starting point is 00:02:24 Like if you wanted to turn out a report of some kind, Google Gemini is good at laying out its rationale or its research plan and letting you kind of tweak that. But if you want to use a chat bot for like self-reflection, maybe you want a new career or something, you might want something that presents a little more empathetically. So if you're new to these things, I'd encourage you to experiment and see what fits your needs with the proviso that you should check the policies around the use of your data for training data because there's stuff that you may want to opt out of.

Starting point is 00:02:51 So I wanted to have this conversation with you because I wanted to talk about how we could be using AI and how you, tech expert, are using AI right now. You've been playing around with some of this stuff. I mean, what has impressed you? We'll get through some of the other parts of some of this stuff. I mean, what has impressed you? We'll get through some of the other parts of this, but with the text stuff, what has impressed you? Yeah, and it's funny, you know, you and I had this sort of water cooler conversations over the couple of months

Starting point is 00:03:13 and we were thinking about like, okay, what practically are the use cases here? But a lot of it is that because there's now this ability to sort of look things up, but also to add your own documents. And if you're knowledgeable about the subject area, it's better because you can kind of see where maybe there are gaps in what it's producing. So it's kind of data wrangling function, I think is potentially quite powerful, but you've monkeyed around with these things too. Yeah, I found the deep research stuff that Google is Gemini has really interesting. It does not replace actually doing the work,

Starting point is 00:03:45 but maybe it supplements doing the work or speeds up doing the work. I put in prompts for very complex issues and said, just pull together all the information about this with one specific tagline. And it essentially creates a report with footnotes. And this is what we've cited. This is the information that we found that we didn't cite. Would you trust it? I'm not entirely sure, but it is interesting. Yes. And we all do this all the time already anyway, right? We look things up on Wikipedia, which we may not find 100% reliable, or maybe the information will be different tomorrow, but for some things that's completely fine.

Starting point is 00:04:18 Right? Like, so it's not like we live in perfectly pure information environments. We make these kinds of decisions all the time about how much certainty we need to have about something and when we need something that's more or less authoritative. But I do think that that citation thing is an important piece. So that's the information part of it and the text part of it. Um, AI image generation has been the subject of great discussion and debate.

Starting point is 00:04:39 Where are we at with that? So much has improved in the last little while, you know, even six months or a year ago, we might've been talking about, oh, the hand has 17 fingers or whatever, and that's not really happening and much more. And again, you're seeing this prominent player OpenAI with GPT-40 that a lot of people have been talking about recently. What is that? It's just their image generation.

Starting point is 00:05:00 Remember, like there was this period where people were making those cartoons from the Japanese animation studio, Studio Ghibli and talking about how powerful this kind of tool was. There's also other ones, there's mid journey, there's runway which was used in the film, everything everywhere all at once. But I noticed the video generation is getting a lot better although sometimes you find there's still this weird kind of sponginess to the movement that doesn't really reflect how bodies are in real physical space. But the latest thing that people are super excited about is Google's

Starting point is 00:05:29 newest video generator, VO3. Explain how this works. I've played around with it a little bit. I don't know that it's particularly sophisticated, but then I see things that people have done and it's incredible. Yeah. I mean, it's wild. I mean, the biggest, most obvious addition here is that you can now type out dialogue and it'll translate it into very realistic speech. So just to give you a sense of how it sounds, here's the VO promotional clip with just a little snippet of a speech of a grizzled old sailor. This ocean, it's a force, a wild, untamed might. And she commands your awe with every

Starting point is 00:06:03 breaking light. Now obviously they picked a particularly good example since it's Google Views promotional video but still like that's something that somebody typed in and that comes out. So explain how that works for people who've never used this you type in not just the dialogue but you type in what you want the thing to what you want the little film to be. Yes and this is the interesting point about using these things is it's not just, you know, make me a movie of the old man in the sea or whatever. You have to think about how these things work and how you translate what you may be thinking about having visually

Starting point is 00:06:35 into written words that you can then share with it. And you can get better at these things. You can get better at prompting. You can get better at saying this is stylistically what I want. This is what the character I want to be like. I asked it for a video of a piano rolling downhill in San Francisco, in the very steep hills in San Francisco. And I mean, it made that with people chasing the piano. But you could, it's kind of the limits of your imagination in some ways are the limits of what you can create, right?

Starting point is 00:07:02 Yeah, yeah. In the best case scenario. And you know, I think the more that you know about, there's still a place for knowing things about art and cinema because the more you know about that, the more you bring your aesthetics to how you get it to produce what you want it to produce. What could possibly go wrong with this? I know, I know. Everything.

Starting point is 00:07:16 This is the scary thing, right? I mean, given that so many of us get our news from social media, right? The potential for fake AI generated images circulating could skew our reality. There have been fake videos about what's been going on in LA, for example. And it doesn't necessarily even have to be politically motivated. I mean, there's money to be made in so-called AI slop that just circulates and generates engagement and clicks, right? And the potential is for misinformation more broadly. The thing about these things is they're

Starting point is 00:07:45 great at generating stuff that's a bit formulaic, right? Like the grizzled little sailor or whatever, because there's a clear format. There's probably tons of trading data. So it's very good at generating clips of TV news anchors. With apologies to our friends at CBC TV news, there is a format for TV news, right? And sit in front of a desk and there's a camera and you're talking. Exactly, yeah, and a certain aesthetic that's usually there. So the potential to generate this looks like really a realistic clip of a news anchor delivering a realistic sounding message about anything about politics, for example.

Starting point is 00:08:19 There are, as we've said, there are limitations to this already. But what a friend of mine keeps reminding me is that when you use it today, it's as bad today as it will ever be. That it will be better in five minutes, it will certainly be much better in a day or two days. Yeah, yeah. I mean, there are definitely limitations with VO3. Like the videos are very short, for example.

Starting point is 00:08:40 Like all generative AI, it's computationally intensive. And that also means that it's energy intensive, right? So there are questions that come to mind about the sustainability of these things. But as your friend says, you know, it's getting better and getting better like very, very rapidly. So how do you think about that? How do you just, we'll get to some of the other ways it's being, how do you think about the energy that creating the three second video is requiring?

Starting point is 00:09:06 I mean, I think we're all at the playing with it stage right now, but what I often think of is if we're going to be using all of this computational power and all of this energy power, shouldn't we be thinking about what it's in the service of? I want to play around with these things as much as anybody, but if you think about millions and millions of people doing this all the time, like, what should be the aim at which we're using this incredibly powerful technologies?

Starting point is 00:09:35 The next frontier in all of this seems to be something called, am I saying this right, agentic AI? Yeah, yeah. What is that? Yeah, they call them the AI agents or agentic AI. You probably already use sort of a proto version of this. If you've asked Siri or Alexa to play a song or dealt with a chatbot at a customer service site, it's basically an AI that can interact with the world and accomplish some kind of task with

Starting point is 00:09:58 some degree of independence. So already, if I asked Gemini to create a resume for me and it goes out and cobbles that together from my LinkedIn profile and my personal website and whatever, that's an example that we're already seeing in an early stage. But that's just sort of the beginning. So what is the promise? What are the companies saying that you will be able to do through these agents? Yeah, I mean, OpenAI Operator is the most prominent new version of this. And the idea here is that these AI agents are going to be kind of like a personal assistant, right? Google's Project Mariner is also working

Starting point is 00:10:28 in this space, but these products are still really in the early stages. And even though you can use Operator with a pro account, it's more like a beta test or as they call it, a research preview. So not quite ready for prime time. So practically though, I mean, how far along are they? Could I use it to book a plane ticket? Could I use it to plan a party or what have you? Yeah, I mean, how far along are they? Could I use it to book a plane ticket? Could I use it to plan a party or what have you? Yeah, I mean, this is where the accuracy piece comes in. Remember, I was talking about checking your work, right? It's one thing to ask an AI to create a resume where you can look it over and know whether it's right or not.

Starting point is 00:10:57 But as a computer scientist Arvind Narayanan points out, a lot of these personal assistant type tasks involve a lot of variables like knowing, you know, do I want a window or an aisle? Do I want to fly at night or in the morning, et cetera, et cetera, et cetera. And even then, if you have to check that things work to make sure it didn't book you on a flight that it hallucinated. If that was a person, you would fire that personal assistant. Right. I mean, you might as well book it yourself at that point, right? So that's the accuracy piece is the main thing.

Starting point is 00:11:22 Where is it working well now? I mean, in addition to those sort of proto agentic things that we talked about, it's being used in these kind of constrained specific domains without 100% having to trust it. So I talked to Graham Newbig about this. He's associate professor at the Carnegie Mellon Language Technology Institute and chief scientist at a thing called All Hands AI. And he uses them for research and programming, and Matt, he uses them a lot.

Starting point is 00:11:47 I use agents probably 10 to 20 times per day. For most of the work I do, I use our own agent, Open Hands, because I'm developing it, so it does the work that I want to do. But I also use kind of chat GPT together with, one thing is because each of them can miss things, if I really want to be thorough in my research, I'll just ask four of them to do research

Starting point is 00:12:11 and then combine together all the results and get like the most comprehensive coverage that I can give. It's like the AI hive mind. Yeah, exactly. I mean, I've used some of this to plan trips. Oh yeah. Thinking we're going somewhere, I want to go hiking. And the more

Starting point is 00:12:27 precise prompt that you give it, the more information and the more specific information, you can refine it over and over and over again. But I still am checking based on other information as well. How far off is this where I'm going to be able to say, this is where I want to go, and it will do all of the planning for me. It will book my plane ticket, it will book the hotel, it will give me, you know, suggestions on where I want to eat, maybe book those restaurant reservations and give me, you know, an itinerary for five days when I'm going hiking in the mountains. I mean, the unfortunate answer is, Matt, is kind of hard to tell. But, you know, Graham says it'll be something around like two years, but that's somewhat ballpark. I mean, people are working now on something called model context protocol. You

Starting point is 00:13:11 heard it here first, Matt, which is the idea to make it easier for agents to interact with apps and websites sort of directly. So the agent wouldn't be like reading through the terms of service at aircanada.com or whatever, the way you and I would, um, they would be kind of interacting more directly. So that companies that want to be useful for these agents would optimize so the agent can essentially be interacting directly. What does that mean for how we live for, for, and how we live with technology? Do you know what I mean? Yeah, I do. I do. And I've thought about this a lot because like for a long time, my experience

Starting point is 00:13:44 of the digital world was mostly through the web, right? And that is the web that I think is really changing. For one thing, if we're not increasingly using it to already to search links to websites, and if websites start to look like they're things for machines to get information from, rather than things for humans to read, that's going to change what the web is for, how it looks, and how it's used. I think it continues this pattern we've seen more broadly with tech, which is that we get further and further abstracted from it and less hands-on.

Starting point is 00:14:12 Do you believe the promise of this? I mean, there's so much, just last point, there's so much hype around this and people are trying to figure out whether the hype is really, people are using this technology already, but whether the promise is actually real. Do you believe in that? I believe in the promise of it being a better way to deal or a different way to deal with the giant gobs of information and data that we have around us and to use that in productive ways. What I don't see in that picture is really valuing the specifically human interpersonal

Starting point is 00:14:43 piece, right? That there is something at a very simple level about you and I having a conversation about traveling and I'm asking you because you were there physically, you smelled the smells, you ate the food, and you can tell me things that a system has not been to those places, doesn't smell anything, does not really understand the concept of France or whatever, right? That's what I feel is like what we haven't really captured is how does the very human ability to know the world around us work very well with these systems that are computationally fascinating but don't know anything really, truly about the world. There is value in asking you about France

Starting point is 00:15:17 because you know about France, the computer does not. Nora, thank you. My pleasure, thank you. Nora Young is a senior technology reporter with the CBC's Visual Investigations Unit. You've been listening to The Current Podcast. My name is Matt Galloway. Thanks for listening.

Starting point is 00:15:30 I'll talk to you soon.

The Current - The greatest artist of the 20th century? AI’s answer and why it matters

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.