The AI Daily Brief: Artificial Intelligence News and Analysis - 5 Reasons DALL-E 3 in ChatGPT is a Huge Deal
Episode Date: September 21, 2023OpenAI has announced that DALL-E 3 image generation is coming natively to ChatGPT. NLW explores 5 reasons the announcement is a huge deal, including how it represents a significant market-driven accel...eration in AI product releases. Before that on the Brief, Amazon is updating Alexa with a dedicated LLM. TAKE OUR SURVEY ON EDUCATIONAL AND LEARNING RESOURCE CONTENT: https://bit.ly/aibreakdownsurvey ABOUT THE AI BREAKDOWN The AI Breakdown helps you understand the most important news and discussions in AI. Subscribe to The AI Breakdown newsletter: https://theaibreakdown.beehiiv.com/subscribe Subscribe to The AI Breakdown on YouTube: https://www.youtube.com/@TheAIBreakdown Join the community: bit.ly/aibreakdown Learn more: http://breakdown.network/
Transcript
Discussion (0)
Today on the AI breakdown, we're talking all about OpenAI's huge announcement of Dolly 3 natively and chatGBT.
Before that other brief, Amazon announces a big Alexa upgrade powered by AI.
The AI breakdown is a daily podcast and video about the most important news and discussions in AI.
Go to Breakdown.network from more information about our YouTube channel, our Discord, and our newsletter.
Welcome back to the AI breakdown brief.
All the AI headline news you need in around.
five minutes. Well, friends, we are officially in the fall. We are officially in the season of product
announcements. In fact, just before I was recording this, I started to see things coming out of the
Microsoft event in New York. Apparently, a wider co-pilot assistant is coming to Windows. Obviously,
we will cover that in full on tomorrow's episode. But for now, we have to catch up with what happened
over the last 24 hours. And we kick off with an announcement around Amazon and its Alexa assistant.
The TLDR is that after many years of being a just medium or low-powered assistant,
Alexa is now getting a generative AI makeover and will be powered by its own Alexa large language model.
Now, this is something that Amazon has talked about before and was sort of one of the most obvious plays for them,
but that does not change how significant a shift it might be.
At their fall hardware event on Wednesday, Amazon started to premiere the new Alexa.
Dave Limp, Amazon's SVP of devices and services, said,
Alexa LLM is a true generalizable large language model that's very optimized for the Alexa
use case. It's not what you find with a bard or ChadGBT or any of those things.
Now, in terms of what this will change, the biggest piece of it is that it will be much
more conversational. In the same way, tools like ChatGBTT allow you to interact with computers
through a natural language interface, the new Alexa powered by Alexa LLM will also give people
the ability to speak more normally rather than have to adopt some specific nomenclature.
Effectively, you won't have to do the Alexa version of prompt engineering anymore.
Limp said that they funneled more than 200 smart home APIs into their LLM, which will give the new Alexa
much greater capacity to actually interact with your smart home in meaningful ways.
An example, the Verge writes, this contextual understanding will extend beyond knowing what other
connected devices you might want to control to things like inferring when something's changed
in your home.
Explains Limp, if you add a new device to your home, you can say, Alexa, turn on the new light,
and it will know what the new light is.
It will disambiguate things, so if you put in a new smart plug or light, it will be easier
to control. Another updated that Alexa will be able to respond to multiple requests at once.
So, for example, turn on the sprinklers and open my garage door and turn off the outside lights,
that's something that Alexa can now figure out. That capacity extends to using natural language
to create a routine without having to manually program the Alexa app. The example that Limp uses
is a morning routine for his kid saying, Alexa, every morning at 8 a.m., turn up the light,
play wake-up music for my kid in his bedroom, and start the coffee maker. Now, a few things about
the implementation. One, obviously there is more room for this to go wrong,
than the limited, more controllable capacities of Alexa in the past.
Obviously, with any LLM, there is a chance of hallucinations,
and that could be particularly problematic in this type of context.
What that means practically is that the new Alexa will be rolled out only in the U.S. at first
and only through a preview mode, which is effectively a large public beta.
The second relevant note is that this souped-up version of Alexa may not be free in the future.
Limps said, quote,
the idea of a superhuman assistant that can supercharge your smart home
and more work complex tasks on your behalf could provide enough utility that we will end up
charging something for it down the road. To me, that sounds like they are definitely going to
charge something for it down the road and they are just preparing people to get used to that
reality now. Anyways, like I said at the top, this is in some ways one of the more predictable updates that
we were going to get from generative AI and the rise of LLMs, the other one that people are waiting
for being a Siri upgrade, but that doesn't make it any less significant for the people who will use it.
Next up, we are getting a number of different announcements in and around the Microsoft ecosystem.
They're having a big event in New York. Like I said, there is information
and news pouring out of that right now. But what got announced yesterday was from one of Microsoft
subsidiaries, GitHub. That announcement was that GitHub's co-pilot chat, which is the chatbot
assistant that lives in GitHub's developer environment, has now moved out of enterprise and business
usage and is available for individuals who have GitHub co-pilot subscriptions. Again, in many ways,
this is an expected product update. We are, as we will discuss even more tomorrow, heading to a
world in which there is going to be an AI-powered chatbot in every application you use, until such
time as it either totally revolutionizes the interface of how we interact with computers,
or it's determined that people actually don't like that interface. But for now, you can expect
this type of experience to come to basically everything. Even if it doesn't work everywhere else,
it seems like it's likely to be a pretty big hit when it comes to coding. And given how significant
the AI assistant for coding use cases, I'm sure that this is announcement that many will be
pretty excited about. Now, it is not just individuals and businesses who are figuring out how to use
artificial intelligence. The Pennsylvania state government has announced that it will prepare to start using
AI across its operations as well. Speaking at Carnegie Mellon in Pittsburgh, Governor Josh Shapiro said Wednesday
that his administration will be convening an AI governing board, publishing principles on the use of
AI, and developing training programs for state employees. His argument is basically that this is going
to be something significant for Pennsylvanians, and so the government needs to be able to understand
it and adapt to it so that they can make sure that they can help their citizens do the same. As Shapiro
put it, we don't want to let AI happen to us. We want to be a part of helping develop AI for the
betterment of our citizens. Now, this is definitely part of a trend of increased attention being paid
to AI at the state government level. Of course, earlier this month, California Governor Gavin Newsom
signed an executive order to study the development of AI. And beyond that, lawmakers in at least
25 different states have introduced bills that address some aspect of artificial intelligence.
All of which certainly lends credence to recent comments from Ray Dalio that AI will greatly disrupt
our lives within a year. Speaking at Fast Company's Innovation Festival this week,
the Bridgewater founder turned public thought leader for boomers, Ray Dalio said that AI will soon be a, quote, great disruptor for all of our lives.
All these changes, he said, are going to happen in the next five years. And when I say that, I don't mean five years from now.
I mean that you're going to see changes next year, the next year, even bigger changes. It's all going to change very fast.
Now, of course, while Dalio isn't an AI expert or anything like that, he is a public intellectual of significant repute, especially for that boomer generation that I just mentioned.
and so I think there is significance that he is also talking about these issues as well.
Speaking of talking about these issues, AI continues to get high billing in Congress and the Senate.
Earlier this week, it was the Senate Intelligence Committee's turn to hold a hearing relating to AI.
Invited witnesses included Dr. Benjamin Jensen, senior fellow at CSIS,
and professor at the Marine Corps University School of Advanced Warfighting,
Dr. Jeffrey Ding, assisted professor of political science at George Washington University,
and Dr. Jan Lacoon, who is a Turing Award winner and the chief AI scientist at META.
Now, one of the things that makes Jan different than some of his peers, particularly his other
2018 Turing Award-winning fellows Jeffrey Hinton and Joshua Benjio, is that to the extent that
they are scared about the AI safety future, Jan has very strong disagreements with their
points of view. Much of that came out in Yan's opening statement, where he took extensive time
to talk about how important open source really is. In that statement, he said, at meta,
we believe it is better if AI is developed openly rather than behind closed doors by a handful
of companies. Generally speaking, companies should collaborate across industry, academia, government,
and civil society to help ensure that such technologies are developed responsibly and with
openness to minimize the potential risks and maximize the potential benefits. Jan also argues
that open sourcing is a key geostrategic advantage for the United States. In that opening statement
again, he said, by open sourcing current AI tools, we can develop and improve the foundational
models faster than others, including potential adversaries, will be able to access and build on
those models. With U.S. leadership, we can cultivate this powerful technology based on our values,
rather than relinquishing it to our adversaries. Leading the AI research and development efforts
puts us in a strong position to enhance the safety of our systems and to warn about potential risks.
Now, at the same time, this is going on, people were discussing recent comments from Mustafa Sullyman,
the CEO of Inflection AI and the former co-founder of Google DeepMind about why open source could
be problematic. These sides are quickly hardening in one of the most important conversations
in artificial intelligence.
Critch retweeted Jan's opening statement and said,
If we dismiss his points here,
we risk building some kind of authoritarian AI industrial complex in the name of safety.
Extinction from AI is a real potentiality,
but so is the permanent loss of democracy.
Both are bad and all sides of this discussion
need to acknowledge each other to expand our set of options for striking a balance.
If AI is developed, used and regulated responsibly,
it can greatly expand the Pareto frontier of possibilities for humanity's future.
And only through civil discourse like this can we achieve that.
Anyways, that is, of course, a much longer conversation that we can have at the brief,
but that is going to wrap us for today.
Lots and lots going on, so stay tuned throughout the week for more.
Next up, the main AI breakdown.
Hey, guys, one more quick note before we dive in.
I so appreciate everyone who has taken the time to fill in our educational content survey.
The TLDR, for those of you who haven't, is that we are considering launching a number of
different types of AI educational content to better help you retrain, re-skill, get to where
you want to be when it comes to AI.
including some really different ideas, such as an AI learning community that would exist on Discord
or some other shared space. But I really need feedback to know what people actually need and what people
most want. And if you are willing to take less than one minute to give me that feedback, I would be ever
grateful. Just go to bit.ly slash AI breakdown survey. And like I said, it'll take you less than a
minute to fill that out. Thanks in advance and let's get to today's show.
Welcome back to the AI breakdown. Today we are talking all about the announcement
from OpenAI yesterday that Dali 3 was on its way and it was going to be integrated directly into chat GPT.
On this episode, I'm going to go through five reasons why I think this is such a big deal.
And while I am designing this for podcast consumption, I also would highly suggest that if you don't yet,
you go subscribe to the YouTube channel as well, as this is obviously inherently a pretty visual story.
By way of previewing the conversation, here are the five reasons I think it's such a big deal.
The first is the advance in quality in one of the most important AI areas of image generation.
The second is its integration with chat GPT, which has both a user experience as well as a
mainstreaming dimension.
The third are the implications for multimodality as the next iteration of generative AI.
Fourth is its impact on competitors like MidGermy.
And five is what I'm calling competitive accelerationism or the increase in speed of development
and release based on market competition.
How that's good and how it also might be not so good.
Let's look at the announcement first.
Yesterday OpenAI tweeted our new text to image model, Dolly 3,
translate nuanced requests into extremely detailed and accurate images. Coming soon to chat
GPT Plus and enterprise, which can help you craft amazing prompts to bring your ideas to life.
Believe it or not, there is tons in there, and many of those themes are announced in their basic
product landing page. Dolly 3, OpenAI writes, understand significantly more nuance and detail
than our previous systems, allowing you to easily translate your ideas into exceptionally accurate
images. The first reason that Dolly 3 is significant is the simple fact that in many ways it seems to be
on advance on the current state of the art, which is Mid Journey 5.2, which came out earlier this year
in June. I'm going to get into a bit more of a comparison between the two and just a moment,
but one of the things that Dolly 3 seems to do really well, at least on first glance, and at least
with the demo images that have been shared, is handling nuance and detail really well. It feels
more precise, and because it's more precise, it ends up being more expressive. When a model is better
able to understand, for example, inspirations or criteria or styles such as pointillism,
or cardboard cutouts or whatever it is, the end result is a lot closer to what the prompter
imagines. Now, the other piece of this, again, compared to Mid Journey specifically, is the fact that
Dolly 3 apparently handles text much better. One of the examples they give is a poster that says
Explore Venus, which is something that you just can't do right now on Mid Journey, although you can do
it with some other models, including those from Stability AI. Anyway, the TLDR for this argument is that
if this was the only advance, just a more precise higher-powered image generation model that would still
be significant. However, that's not the only thing about this, and in fact, in some ways, it's
overshadowed by where this exists and how people are going to interact with it. Dali3 is built
natively into chat GPD, and that means a number of different things. First of all, it means
that it's more natively suited to natural language prompting than the sort of prompt
engineering that people have to do with tools like Mid Journey. Indeed, OpenAI says that part of what
makes Dali 3 so much more expressive and so much better able to handle nuance is the fact that it is
natively built on GPT. On the product page, this is one of the biggest highlights that they make.
OpenAI writes, modern text to image systems have a tendency to ignore words or descriptions,
forcing users to learn prompt engineering. Dali 3 represents a leap forward in our ability to generate
images that exactly adhere to the text you provide. The example they gave is a nighttime
illustration of what might be mid-20th century London, in which they point out how a number of
different parts of the natural language prompt come to life. A sidewalk bustling with pedestrians enjoying
the nightlife, a bustling street under the shine of a full moon. At the corner stall, a young
woman with fiery red hair dressed in a signature velvet cloak is haggling with the grumpy old
vendor. The grumpy vendor, a talented man, is wearing a sharp suit, sports a noteworthy mustache,
and is animatedly conversing on a steampunk telephone. Now, one thing it should be noted is that
right now, all we have in terms of understanding the quality of Dolly 3 are the highly
curated and selected examples that OpenAI has given us. So far as I have seen, the company
has promised that this will be rolling out into chat GPT Plus and Enterprise in the near future,
But so far, I haven't seen anybody that actually has access to it.
Which means, of course, that we have to take any big claims with a grain of salt until we can test it ourselves.
Still, if Dolly 3 really can handle this level of linguistic nuance when it comes to prompting,
it is a total game changer relative to what you get from something like mid-journey.
The other piece of that is how ChatGBT-BT might actually be a useful assistant in refining prompts.
Again, OpenAI writes, Dolly 3 is built natively on ChatGBTBT,
which lets you use Chat-GBT-BT as a brainstorming partner and refiner of prompts.
Just ask ChatGPT what you want to see in anything from a simple sentence to a detailed paragraph.
When prompted with an idea, ChatGPT will automatically generate tailored detailed prompts for Dolly 3 that bring your idea to life.
If you like a particular image but it's not quite right, you can ask ChatGPT to make tweaks with just a few words.
Now, the other important piece of the argument about why it's so transformative to be integrated natively with ChatGPT is of course just distribution.
Right now, the biggest player in this text-to-image space is Mid-Jurney, and to use Mid-Journey, you basically have to live inside Discord.
That is a huge barrier for normal consumers, and the fact that there are already hundreds of millions
of users of chat GPT that now might just natively have access to Dali, although yes, of course,
it's just for plus and enterprise users at first, is obviously a really big deal as well.
Now, obviously the other piece of the integration with chatchip T, which I've decided to
break out as a third reason that this is a really big deal, is that this is a clear first step
towards multimodality.
Quantum physicist Kevin Fisher wrote,
Dolly 3 is a massive step change improvement to put Dali in chat.
A year ago pre-chat GPT,
even I built an alpha that combined chat with images.
A small group of people felt this magic together and still reminisce.
This is a 100% revolutionary release from OpenAI.
Kevin quote tweeted himself from October 19th of last year,
when he wrote prompt engineering is not the future,
the future looks more like this.
In it, he shares an example of this alpha,
which has a chat style interface that interacts with images as well.
Dr. Jim Fan from NVIDIA is even more precise in his articulation
of what Dolly 3 means in terms of multimodality.
He says, I think Dolly 3 is not just a stance against Mid-Jurney, it's actually a sneak
peek of the upcoming epic battle of massively multimodal LLMs against DeepMind Gemini.
Quote, Dolly 3 is built natively on chat GPT.
This is the key phrase.
Dali 3's extraordinary language alignment is built on a solid textual GPT foundation.
Mid Journey doesn't really have much reasoning brain, which is why so much prompt hacking
is needed.
Brain first, pixel second.
That's the way to build a strong multimodal AI.
So the third reason that this is such a big deal is that it's a first thing.
step at a preview of the multimodal future.
Now, one thing that has been lurking behind and the fourth reason it's a big deal is its impact
on immediate competitors like Mid Journey.
Chase Lean tweeted, OpenAI just announced Dolly three, their new AI image generator.
Will it replace Mid Journey?
I think it all comes down to two things.
One, how good is Chat Chepti at understanding the user and writing prompts for Dolly?
Two, how good will Mid Journey version 6 be?
As Chase points out, expert mid-jorney users know how to write prompts, but they're concerned
about things like image quality, prompt adherence, image resolution, character, consistency,
Based on the recent demo, I'd say that Dolly and Mid-Jurney version 5.2 have similar quality.
But here's the thing.
Dali is way better at prompt adherence.
Unlike Mid-Journey, Dali generates almost everything requested inside the prompt.
It can also make some decent text.
Combined with GPD, which is amazingly good at writing descriptions, this makes Dali truly
formidable.
A huge win for Open AI.
But Mid-Journey version 6 is on the horizon and is rumored to have much better image quality,
prompt adherence, and an upscaler to increase image resolution.
For the past two years, superior image quality has
kept Mid Journey in front of stable diffusion.
If Mid Journey version 6 is able to keep making the best images,
it will continue to being the AI image generator of choice.
I think that that's an accurate assessment.
I can only speak to myself and the people that I observe,
but I make probably 50 to 100 mid-jorney images per day.
Some of them are for work,
many of them are just for fun or for creativity or for exploration.
I can say beyond a shadow of a doubt,
that if Mid Journey version 6 produces what I perceive to be better results
and higher quality images than Dolly 3,
it's not going to replace Mid Journey at least not fully for me.
Convenience is not enough, even if convenience gives Dolly 3 a big edge when it comes to bringing
more people into the image generation space in the first place.
More than that, I think that obviously this announcement has to light a huge fire under Mid Journey's
butts to advance things more quickly.
Nick St. Pierre went viral with a tweet that said simply,
Mid Journey needs to get out of Discord ASAP, and that's a pretty obvious one.
A native app for Mid Journey is no longer really an option.
It's also going to force them to try to reach parity with things like text generation.
And so I think if nothing else, you can expect that the next.
version of Mid-Journey is going to be a significant amount better, or they risk losing the race entirely.
That gets us, however, to our fifth big point, which is maybe the most significant in some ways,
and that's what I'm calling competitive accelerationism.
Professor Ethan Mollock writes, it looks like the Gemini quickening has begun.
With the upcoming release of the first model to likely beat GPT4, at least temporarily,
you can see a burst of announcements, GPT4 multimodal and Dali 3, barred integrations like more.
Competition in the space is increasing velocity.
Now, of course, to the extent that I just mentioned the positive impacts of that, mid-journey getting better because of that competitive pressure,
there are many who are concerned about exactly this type of competitive accelerationism.
It was one of the big reasons that Jeffrey Hinton said he left Google,
that the market pressure to compete and move faster was making companies act less safely when it came to AI releases,
and physicist and MIT professor Max Tegmark said something very similar today.
A Guardian piece came out this morning called AI-focused tech firms locked in race to the bottom, warns MIT professor.
physicist Segmark says competition is too intense for tech executives to pause development to consider AI risks.
Now, the piece itself wasn't specifically about this release of Dali 3, but more about this broader
trend that Professor Mollick was just noticing. You have every big tech company now firing on all cylinders
to get ahead, or in the case of open AI to stay ahead, and that works at cross purposes with slow,
deliberate, considered AI safety discussions. It even works at cross purposes with preventative mechanisms
such as red teaming. Now, all in all, this is a much lower priority right now for the average person
who's interacting with this Dolly 3 news. Like I said, I think people are more excited about this
product announcement than anything I've seen in months, maybe since GPT4. But it is the background
noise. It is a piece of this discussion that can't be ignored. And I think that Professor Malik
is right to identify that the phase that we're entering in which GPT4 becomes no longer
inaccessible, but match or exceeded, is one which will have dramatic impacts on the shape of the industry
because of this competitive accelerationism.
Still, when it comes to Dolly 3 specifically,
it is very, very hard for me not to be just fully excited, ready to dive in.
And so October, when they say this will begin rolling out in chat GPT Plus,
can't come soon enough.
Until then, I will be watching to see what people inside Open AI create,
and I will report back my favorite findings.
If you want to share your text-to-image images,
come join us on the AI breakdown Discord.
It's a great place for exactly that sort of conversation.
I just dumped in a bunch of custom magic cards I created as a four example,
you can find a link to the Discord at bit.ly slash AI breakdown, and I'll see you there.
Thanks as always for listening or watching. Until next time, peace.
