The Decibel - AI hype vs. AI reality
Episode Date: June 19, 2024Artificial Intelligence has been creeping into our lives more and more as tech companies release new chatbots, AI-powered search engines, and writing assistants promising to make our lives easier. But..., much like humans, AI is imperfect and the products companies are releasing don’t always seem quite ready for the public.The Globe’s Report on Business reporter, Joe Castaldo is on the show to explain what kind of testing goes into these models, how the hype and reality of AI are often at odds and whether we need to reset our expectations of Generative AI.Questions? Comments? Ideas? E-mail us at thedecibel@globeandmail.com
Transcript
Discussion (0)
.
Big tech companies have been leaning into artificial intelligence.
We unveiled the new AI-powered Microsoft Bing and
Edge to reinvent the future of search.
We want everyone to benefit from what Gemini can do.
You're using it to debug code,
get new insights, and to build the next generation of AI applications.
Mike, it seems like you might be gearing up to shoot
a video or maybe even a live stream.
Yeah. In fact,
we've got a new announcement to make.
Is this announcement related to OpenAI perhaps?
It is.
I'm intrigued.
In fact, what if I were to say that you're related to
the announcement or that you are the announcement?
Me? The announcement is about me?
Well, color me intrigued.
The promise of generative AI technology is that it will change our lives for the better.
But so far, many of these AI rollouts are not living up to expectations.
There's the Microsoft Bing chatbot last year that expressed love for a New York Times reporter and suggested the reporter
leave his partner. Joe Costaldo is with the Globe's Report on Business and has been extensively
covering artificial intelligence. Google's Gemini image generator, which did things like depict
America's founding fathers as not white chatbbots inventing legal citations and getting lawyers in trouble.
Even back in 2016, Microsoft released a chatbot on Twitter called Tay, and people very quickly figured out how to make it say quite heinous things.
And Tay was never heard from again. Today, he'll tell us why AI hype is often different from reality,
why companies roll out these technologies that don't seem quite ready, and what that does to
public trust. I'm Manika Ramanwelms, and this is The Decibel from The Globe and Mail.
Joe, great to have you here. Thanks for having me.
So Joe, we've now seen a number of launches for new AI tools.
Just tell me, though, about, I guess, the usual pattern that these launches often tend to follow.
Yeah, there does seem to be a bit of a pattern.
Generally, there's a lot of hype that this new tool or model is going to improve our lives, improve the way we work or get information. And then,
you know, the thing is released to the public and it kind of falls flat. It looks a little
half-baked. People very quickly find out all the ways that this new model or application fails or
makes things up and gets things wrong or says things that are just unhinged and share examples
online. And there's,
you know, sort of this negative media cycle of bad headlines and bad PR. Sometimes the company,
you know, acknowledges the mistake and promises a quick fix. And it just seems to happen again
and again. And Google being the latest example with its AI overviews in search.
Yeah, let's talk about that as an example,
because I think this is in people's minds a little bit,
because this was fairly recent.
What happened there?
How did this not go exactly as planned for Google?
Yeah, so this was a big change, actually, to Google Search,
which makes the company billions of dollars,
and it's like our gateway to the internet,
you know, has been for years. And so they started putting on top of search results,
an AI generated summary of whatever the query was.
And this was probably, this wasn't in Canada. It was only being tested in a few places, right?
Yes. Google only rolled it out in the US to start last month and their plans to roll it out into other countries, including Canada down the road.
And Google pitched this as a way to get information, you know, faster and easier,
like you don't have to do the hard work of clicking a link yourself. And so very, again, very quickly, users started noticing that these AI overviews could be wrong or just flat out nonsensical.
So an AI overview recommended eating rocks, eating one rock a day for the nutritional benefits.
In a pizza recipe, it included glue as a way to, you way to get that cheese to stick to the pizza.
And they're just flat out factual errors. One query was, who was the first Muslim president
of the United States? And the AI overview said Barack Obama, perhaps picking up on this
conspiracy theory that he's some kind of secret Muslim. So Google responded fairly quickly and said, you know, these instances are rare, but they
also made, you know, about a dozen technical fixes, they said.
And they were clear that this isn't a case of AI hallucinating, which is the phenomenon
where an AI model just makes stuff up.
That is actually the term people use, hallucinating.
Yeah.
And it's, you know, people debate if that's an appropriate word or not, but, you know, AI,
generative AI makes stuff up, basically. And they said it wasn't so much that, it was more
pulling from websites, maybe that it shouldn't have been. And so, you know, they tried to address
that problem. And it just has the appearance of something that was released when it wasn't quite ready.
Yeah. So, I mean, this seems to fall into the pattern of what you were talking about earlier, Joe, right, where we see this happen with these new releases that aren't quite set for the public.
But I guess the big question is, why is this happening? Why does this continue to happen?
Yeah, there's a few reasons. I mean, I think the obvious one is just competition.
When ChatGPT was released toward the end of 2022, it really touched off an arms race where every company had to do AI now, like generative AI was seen as the next big thing, the next huge
market opportunity. So companies are willing to make mistakes and risk some bad PR in order to get
something out, in order to be seen as first. And like, if you're too slow, there are consequences.
Google, for instance, has been developing generative AI internally for a long time.
But it wasn't necessarily releasing everything to the public. When OpenAI released ChatGPT,
all of a sudden there's a lot of pressure on Google
to start doing something with all of this research.
And as a quick example,
Google had an image generator in early 22 called Imagine,
but it wasn't released to the public.
And the team that made that later left Google
and started their own company in Toronto called Ideogram,
partly because they felt they could move faster outside of Google.
And look at Apple, too.
It's one of the few big tech companies that wasn't really doing anything with generative
AI.
It's a device company in many ways.
And so there are a lot of questions about what does AI mean for Apple?
And they finally had an event recently where they're going to partner with OpenAI and integrate generative AI into iOS in a bunch of different ways.
And the stock price is up quite a bit since that event.
In response to that.
Yeah, I think because investors, there's some relief on the part of investors who can say, OK, finally, like Apple is doing something with AI now.
OK, so what you're describing is really this pressure on these companies to kind of keep up with each other and roll these things out, even if they're not totally ready yet.
It's interesting. I think we should dig a little bit deeper into this idea, because this concept of releasing something, even though it's not really set, seems to be, I guess, part of the Silicon Valley mindset, if I can say that, Joe.
Like, it goes beyond just AI.
Why is it the way that these companies tend to operate?
Yeah, there's a couple of things there. of something, some tool, some application that a company will build and release to test
like market demand, customer need, rather than spend a lot of time and money releasing
something complete that might flop.
So it's, you know, it can be a smart way to do things.
Make sure there's a market before you throw a lot of money into it.
Exactly.
Yeah.
And also, you know, the move fast and break things ethos has been part of Silicon Valley for quite some time.
You know, Facebook kind of being the poster child for that, like when the company was really growing, it endured a lot of scandals about, you know, privacy concerns and data breaches and being, you know, hijacked to manipulate elections and so on.
So, yeah, that mentality is there in tech.
But I think there's something a little different going on with generative AI.
Like Facebook, for all of its faults, the core product more or less worked.
You know, you add friends, you post pictures, you like,
you comment, you get served up ads. Generative AI is different in that this technology is a bit
more unwieldy. It doesn't always behave the way that you want it to. It makes mistakes. It outputs
things that are not true. And that's not a bug that can be fixed with some more work and coding.
It's just inherent in how these AI models work.
And companies are doing lots of things to try to improve accuracy.
But it's a very, very hard problem to solve.
And so until that's addressed, we'll see more flubs and mistakes and launches that go sideways.
I guess, why is it that these problems are so complex when it comes to generative AI? I mean,
maybe that's obvious, but I guess, yeah, why are the problems that we're seeing now,
why aren't they as easily fixable as previous problems, like you were saying with Facebook
before? Yeah, I mean, this is a simplification, but, you know, with a chatbot, for example, or the large language model that underlies the chatbot, you know, it's effectively predicting the next word in a sequence based on, you know, tons and tons of data that it has been that it has analyzed.
But an AI model has no idea what is true and what is fiction.
We'll be right back.
Joe, can we talk a little bit about how these products are tested? Because, of course,
companies are testing these models before they do roll them out. Do we know what that means,
though, exactly? Like, what kind of tests are actually run on these tools? Yeah, there's a concept called red teaming, which is fairly big,
where employees, you know, team of employees tries to like test the vulnerabilities of an AI model,
like, can you make this AI chatbot say something it's not supposed to? Can you make it say
conspiracy theories or something discriminatory?
So red teaming is kind of like trying to break it in a way to see if it will break.
Yes, exactly.
Like ethical hacking in a way so that, you know, you can better understand the vulnerabilities
and fix them before it's released to the public.
So that's a big focus, but it's not sufficient.
So I guess still, though, like, why are we still seeing these problems, even with these measures,
this testing that is happening? Why is that still such a struggle?
Yeah, it's hard to know without having insight into our, like, a particular company before a
release. But I mean, red teaming, you know, there are, you know, tensions between needing to
commercialize something and, you know, making sure it's safe. Like there has been a shift where,
you know, generative AI previously was, you know, it's kind of a research project, like, you know,
university labs were working on it, corporate labs were working on it, again, with an eye to,
you know, commercialize something down the road, It wasn't seen as like ready for public release. So there's presumably some tension there. You know,
with red teaming, like, do they have enough time? Are there enough people? Is the team diverse
enough to, you know, find bias and other vulnerabilities? This is something that we've talked about generally that can be an issue with tech.
How does that play into this?
Well, so let's take image generation, for example.
It's a well-known problem that image generators have bias and stereotypes kind of built into them.
And that's just a reflection of our society and our bias and our problems because AI models
are trained on data that we as humans put out there in the world.
So if you ask an image generator to produce a picture of a CEO, chances are it'll be a
white man, a doctor, a man, a nurse, a teacher, a mugshot.
Chances are it might overly represent black people, for example.
So it takes, you know, a diverse team to think about these issues and try to address them before launch.
In Google's case, with its Gemini image generator earlier this year,
it may have overcorrected. So people found that it was producing historically inaccurate pictures.
So again, like America's founding fathers as black people, for example, or German World War
II soldiers depicted as people who are not
white. Google was trying to inject more diversity into the output, but went too far, perhaps. And
Google paused image generation on Gemini so that it could, you know, work to address this.
There was a bit of a narrative that like, oh, AI is woke. It's too woke. That's the problem,
which is, you know, just silly. It's not about that. It's
just an indication that these models are hard to control, hard to have accurate, predictable
output, and some of the blind spots of teams that are developing them.
Yeah. Yeah. It seems to really illustrate that. I'm wondering what experts, I guess,
have told you about all of this, Joe, because obviously companies, I guess,
see an advantage of releasing products this way.
They continue to do it.
But what did experts tell you so far about the issues that we've seen?
Yeah, so Ethan Mollick, who's a professor at the Wharton School of Business in the U.S., has this really interesting take that, you know, perfection is the wrong standard to use for generative AI.
So something doesn't have to be perfect in order for it to be useful.
So he and his colleagues in the Boston Consulting Group did this really interesting study a while back where they gave some consultants access to GPT-4, which is OpenAI's
latest model, and other consultants did not have access to AI. And they gave them a bunch of tasks
to do. And, you know, to simplify what they found is the consultants who had access to GPT-4
were much more productive and they had higher quality results on a lot of tasks than
the consultants who did not have AI. And these were tasks like come up with 10 ideas for a new
shoe, write some marketing material for it, write a press release for it. So more on the creative
end. But they also designed a task that they knew AI could not do well. And so what they found is
the consultants who used AI, their results were worse, like much worse than consultants who did
not use AI. So you might think that's kind of obvious. Like if a tool isn't up to the job,
of course, the results are going to be worse. But I think the important takeaway is,
if you don't have a good understanding of where the limits of generative AI are,
you will make mistakes. It can be detrimental to you.
I know you spoke to another expert who was talking about something called the error rate,
Joe. I guess this is a little bit about how we trust these tools. Can you tell me about the error rate? Yeah. So just trying to figure out how often an AI model might hallucinate. And so that
was, I was speaking to Melanie Mitchell, who's a computer science professor in the US. And this
ties back to knowing where the limits of AI are. But it's a little tricky because she was saying that, you know, if we know chat GPT makes mistakes 50% of the time, like we won't trust it as much and we will check the output more often, right?
Because we know there could be a lot of mistakes.
But if, you know, it's only 5% of the time or 2% of the time, we won't, right?
Like only 2% of the time, you know, it's chance it's probably fine, but it might not be.
So in that way, you know, more mistakes could slip through.
So she was saying that, you know, the better system in some ways could actually be riskier because we're more likely to trust it.
I mean, this idea of us, how we understand and
how we trust these tools, I think is really, is really fascinating. And I guess I wonder to come
back to how these launches are actually done. Does the way that they're rolled out and they're,
you know, they break and they do all these strange things. Does that actually, I guess,
erode our trust, the public's trust in this technology? It could. If your first exposure to something new is kind of ho-hum, or if it's
influenced by a lot of negative coverage, maybe you won't try it again or try it at all. But the
thing is, we're getting generative AI, whether we like it or not. There's a lot of AI bloat.
What exactly does that mean?
I guess I think of AI bloat as unnecessary AI features
that make products, you know, in some cases more expensive
or just more annoying to use.
You know, like there are laptops coming out
that have a Microsoft Copilot button on them
to get easy access to AI tools.
I think even like in WhatsApp now, right?
You have the meta ask me anything that's there as well.
Yeah, same thing on Instagram.
So the meta AI chatbot is in search.
So it's coming.
But there are still a lot of questions about like, is this better?
Is this something people want?
How does this improve a user's experience? You know, and you have to wonder, too, this company's added more AI features, are they going to have to jack up the price? So it's a value for money question.
Just in our last few minutes here, Joe, so these products are not perfect right now. We've talked about this extensively at this point. Is the understanding, though, that they're going to get better over time, that this is just kind of a temporary phase until we
actually get over this hump to something better? Yeah, I mean, that's the arc of technology
generally. But the question is, you know, how fast? I think a lot of AI developers assume that
this technology is going to improve very quickly. And if you look at the past few years, it certainly has.
Like take, for example, I'm sure a lot of people saw
the AI-generated video of Will Smith eating spaghetti.
Uncle Phil, come try this.
Fresh pasta of Bel-Air.
Which was hilarious, but also horrifying.
Compare that to some of the AI-generated videos now from companies like Runway or OpenAI's Sora,
and the leap in quality is quite astounding.
It's not perfect, but it's huge.
But progress doesn't necessarily continue at the same rate.
The approach to AI now is get a whole lot of data, a whole lot of,
you know, compute or GPUs. And the more data, the more GPUs you have, the better the AI at the other
end of this process. But there are real challenges in getting more data. And, you know, there was a
study earlier this year from Stanford, sort of about the state of AI.
And one of the things that noted is that progress on a lot of benchmarks is kind of stagnated.
And that could be a reflection of diminishing returns of this approach. Progress might not be as linear as some people are assuming.
Just very lastly here, Joe, when we look at these releases, the way things are rolled out here, what does this tell us about how seriously these companies are taking the big questions like ethics, the safety of these tools?
What can we glean from this?
Yeah.
I mean, there has been a lot of concern about the pace of progress and companies releasing AI into the wild. Like last year, there was a very high profile open letter, you know, asking for a six month
pause on development so that regulations could catch up.
And of course, nobody paused, nobody stopped.
And so I guess what we're seeing now with this kind of rush and companies scoring own goals and making mistakes that could have been avoided to some extent with more care and thought,
doesn't necessarily bode well for the future, especially if AI models become more powerful, more sophisticated, more integrated into our lives that arguably carries more risk.
So this is where regulation comes in, why so many people are concerned about regulating AI. I mean,
the EU has passed their AI Act. You know, there's a bill here in Canada, there's lots of efforts in
the US. So, you know, there's an argument to be made that if you want companies to behave
responsibly, you know, take ethics
and safety seriously, you have to force them to through the law.
Joe, this was so interesting. Thank you for being here.
Thanks for having me.
That's it for today. I'm Mainika Raman-Wellms. Kelsey Arnett is our intern. Our producers are
Madeline White, Cheryl Sutherland, and I'll talk to you soon.