The AI Daily Brief: Artificial Intelligence News and Analysis - 5 Reasons DALL-E 3 in ChatGPT is a Huge Deal

Starting point is 00:00:00 Today on the AI breakdown, we're talking all about OpenAI's huge announcement of Dolly 3 natively and chatGBT. Before that other brief, Amazon announces a big Alexa upgrade powered by AI. The AI breakdown is a daily podcast and video about the most important news and discussions in AI. Go to Breakdown.network from more information about our YouTube channel, our Discord, and our newsletter. Welcome back to the AI breakdown brief. All the AI headline news you need in around. five minutes. Well, friends, we are officially in the fall. We are officially in the season of product announcements. In fact, just before I was recording this, I started to see things coming out of the

Starting point is 00:00:42 Microsoft event in New York. Apparently, a wider co-pilot assistant is coming to Windows. Obviously, we will cover that in full on tomorrow's episode. But for now, we have to catch up with what happened over the last 24 hours. And we kick off with an announcement around Amazon and its Alexa assistant. The TLDR is that after many years of being a just medium or low-powered assistant, Alexa is now getting a generative AI makeover and will be powered by its own Alexa large language model. Now, this is something that Amazon has talked about before and was sort of one of the most obvious plays for them, but that does not change how significant a shift it might be. At their fall hardware event on Wednesday, Amazon started to premiere the new Alexa.

Starting point is 00:01:21 Dave Limp, Amazon's SVP of devices and services, said, Alexa LLM is a true generalizable large language model that's very optimized for the Alexa use case. It's not what you find with a bard or ChadGBT or any of those things. Now, in terms of what this will change, the biggest piece of it is that it will be much more conversational. In the same way, tools like ChatGBTT allow you to interact with computers through a natural language interface, the new Alexa powered by Alexa LLM will also give people the ability to speak more normally rather than have to adopt some specific nomenclature. Effectively, you won't have to do the Alexa version of prompt engineering anymore.

Starting point is 00:01:54 Limp said that they funneled more than 200 smart home APIs into their LLM, which will give the new Alexa much greater capacity to actually interact with your smart home in meaningful ways. An example, the Verge writes, this contextual understanding will extend beyond knowing what other connected devices you might want to control to things like inferring when something's changed in your home. Explains Limp, if you add a new device to your home, you can say, Alexa, turn on the new light, and it will know what the new light is. It will disambiguate things, so if you put in a new smart plug or light, it will be easier

Starting point is 00:02:22 to control. Another updated that Alexa will be able to respond to multiple requests at once. So, for example, turn on the sprinklers and open my garage door and turn off the outside lights, that's something that Alexa can now figure out. That capacity extends to using natural language to create a routine without having to manually program the Alexa app. The example that Limp uses is a morning routine for his kid saying, Alexa, every morning at 8 a.m., turn up the light, play wake-up music for my kid in his bedroom, and start the coffee maker. Now, a few things about the implementation. One, obviously there is more room for this to go wrong, than the limited, more controllable capacities of Alexa in the past.

Starting point is 00:02:56 Obviously, with any LLM, there is a chance of hallucinations, and that could be particularly problematic in this type of context. What that means practically is that the new Alexa will be rolled out only in the U.S. at first and only through a preview mode, which is effectively a large public beta. The second relevant note is that this souped-up version of Alexa may not be free in the future. Limps said, quote, the idea of a superhuman assistant that can supercharge your smart home and more work complex tasks on your behalf could provide enough utility that we will end up

Starting point is 00:03:22 charging something for it down the road. To me, that sounds like they are definitely going to charge something for it down the road and they are just preparing people to get used to that reality now. Anyways, like I said at the top, this is in some ways one of the more predictable updates that we were going to get from generative AI and the rise of LLMs, the other one that people are waiting for being a Siri upgrade, but that doesn't make it any less significant for the people who will use it. Next up, we are getting a number of different announcements in and around the Microsoft ecosystem. They're having a big event in New York. Like I said, there is information and news pouring out of that right now. But what got announced yesterday was from one of Microsoft

Starting point is 00:03:56 subsidiaries, GitHub. That announcement was that GitHub's co-pilot chat, which is the chatbot assistant that lives in GitHub's developer environment, has now moved out of enterprise and business usage and is available for individuals who have GitHub co-pilot subscriptions. Again, in many ways, this is an expected product update. We are, as we will discuss even more tomorrow, heading to a world in which there is going to be an AI-powered chatbot in every application you use, until such time as it either totally revolutionizes the interface of how we interact with computers, or it's determined that people actually don't like that interface. But for now, you can expect this type of experience to come to basically everything. Even if it doesn't work everywhere else,

Starting point is 00:04:31 it seems like it's likely to be a pretty big hit when it comes to coding. And given how significant the AI assistant for coding use cases, I'm sure that this is announcement that many will be pretty excited about. Now, it is not just individuals and businesses who are figuring out how to use artificial intelligence. The Pennsylvania state government has announced that it will prepare to start using AI across its operations as well. Speaking at Carnegie Mellon in Pittsburgh, Governor Josh Shapiro said Wednesday that his administration will be convening an AI governing board, publishing principles on the use of AI, and developing training programs for state employees. His argument is basically that this is going to be something significant for Pennsylvanians, and so the government needs to be able to understand

Starting point is 00:05:08 it and adapt to it so that they can make sure that they can help their citizens do the same. As Shapiro put it, we don't want to let AI happen to us. We want to be a part of helping develop AI for the betterment of our citizens. Now, this is definitely part of a trend of increased attention being paid to AI at the state government level. Of course, earlier this month, California Governor Gavin Newsom signed an executive order to study the development of AI. And beyond that, lawmakers in at least 25 different states have introduced bills that address some aspect of artificial intelligence. All of which certainly lends credence to recent comments from Ray Dalio that AI will greatly disrupt our lives within a year. Speaking at Fast Company's Innovation Festival this week,

Starting point is 00:05:45 the Bridgewater founder turned public thought leader for boomers, Ray Dalio said that AI will soon be a, quote, great disruptor for all of our lives. All these changes, he said, are going to happen in the next five years. And when I say that, I don't mean five years from now. I mean that you're going to see changes next year, the next year, even bigger changes. It's all going to change very fast. Now, of course, while Dalio isn't an AI expert or anything like that, he is a public intellectual of significant repute, especially for that boomer generation that I just mentioned. and so I think there is significance that he is also talking about these issues as well. Speaking of talking about these issues, AI continues to get high billing in Congress and the Senate. Earlier this week, it was the Senate Intelligence Committee's turn to hold a hearing relating to AI. Invited witnesses included Dr. Benjamin Jensen, senior fellow at CSIS,

Starting point is 00:06:29 and professor at the Marine Corps University School of Advanced Warfighting, Dr. Jeffrey Ding, assisted professor of political science at George Washington University, and Dr. Jan Lacoon, who is a Turing Award winner and the chief AI scientist at META. Now, one of the things that makes Jan different than some of his peers, particularly his other 2018 Turing Award-winning fellows Jeffrey Hinton and Joshua Benjio, is that to the extent that they are scared about the AI safety future, Jan has very strong disagreements with their points of view. Much of that came out in Yan's opening statement, where he took extensive time to talk about how important open source really is. In that statement, he said, at meta,

Starting point is 00:07:03 we believe it is better if AI is developed openly rather than behind closed doors by a handful of companies. Generally speaking, companies should collaborate across industry, academia, government, and civil society to help ensure that such technologies are developed responsibly and with openness to minimize the potential risks and maximize the potential benefits. Jan also argues that open sourcing is a key geostrategic advantage for the United States. In that opening statement again, he said, by open sourcing current AI tools, we can develop and improve the foundational models faster than others, including potential adversaries, will be able to access and build on those models. With U.S. leadership, we can cultivate this powerful technology based on our values,

Starting point is 00:07:40 rather than relinquishing it to our adversaries. Leading the AI research and development efforts puts us in a strong position to enhance the safety of our systems and to warn about potential risks. Now, at the same time, this is going on, people were discussing recent comments from Mustafa Sullyman, the CEO of Inflection AI and the former co-founder of Google DeepMind about why open source could be problematic. These sides are quickly hardening in one of the most important conversations in artificial intelligence. Critch retweeted Jan's opening statement and said, If we dismiss his points here,

Starting point is 00:08:10 we risk building some kind of authoritarian AI industrial complex in the name of safety. Extinction from AI is a real potentiality, but so is the permanent loss of democracy. Both are bad and all sides of this discussion need to acknowledge each other to expand our set of options for striking a balance. If AI is developed, used and regulated responsibly, it can greatly expand the Pareto frontier of possibilities for humanity's future. And only through civil discourse like this can we achieve that.

Starting point is 00:08:34 Anyways, that is, of course, a much longer conversation that we can have at the brief, but that is going to wrap us for today. Lots and lots going on, so stay tuned throughout the week for more. Next up, the main AI breakdown. Hey, guys, one more quick note before we dive in. I so appreciate everyone who has taken the time to fill in our educational content survey. The TLDR, for those of you who haven't, is that we are considering launching a number of different types of AI educational content to better help you retrain, re-skill, get to where

Starting point is 00:09:04 you want to be when it comes to AI. including some really different ideas, such as an AI learning community that would exist on Discord or some other shared space. But I really need feedback to know what people actually need and what people most want. And if you are willing to take less than one minute to give me that feedback, I would be ever grateful. Just go to bit.ly slash AI breakdown survey. And like I said, it'll take you less than a minute to fill that out. Thanks in advance and let's get to today's show. Welcome back to the AI breakdown. Today we are talking all about the announcement from OpenAI yesterday that Dali 3 was on its way and it was going to be integrated directly into chat GPT.

Starting point is 00:09:42 On this episode, I'm going to go through five reasons why I think this is such a big deal. And while I am designing this for podcast consumption, I also would highly suggest that if you don't yet, you go subscribe to the YouTube channel as well, as this is obviously inherently a pretty visual story. By way of previewing the conversation, here are the five reasons I think it's such a big deal. The first is the advance in quality in one of the most important AI areas of image generation. The second is its integration with chat GPT, which has both a user experience as well as a mainstreaming dimension. The third are the implications for multimodality as the next iteration of generative AI.

Starting point is 00:10:16 Fourth is its impact on competitors like MidGermy. And five is what I'm calling competitive accelerationism or the increase in speed of development and release based on market competition. How that's good and how it also might be not so good. Let's look at the announcement first. Yesterday OpenAI tweeted our new text to image model, Dolly 3, translate nuanced requests into extremely detailed and accurate images. Coming soon to chat GPT Plus and enterprise, which can help you craft amazing prompts to bring your ideas to life.

Starting point is 00:10:44 Believe it or not, there is tons in there, and many of those themes are announced in their basic product landing page. Dolly 3, OpenAI writes, understand significantly more nuance and detail than our previous systems, allowing you to easily translate your ideas into exceptionally accurate images. The first reason that Dolly 3 is significant is the simple fact that in many ways it seems to be on advance on the current state of the art, which is Mid Journey 5.2, which came out earlier this year in June. I'm going to get into a bit more of a comparison between the two and just a moment, but one of the things that Dolly 3 seems to do really well, at least on first glance, and at least with the demo images that have been shared, is handling nuance and detail really well. It feels

Starting point is 00:11:21 more precise, and because it's more precise, it ends up being more expressive. When a model is better able to understand, for example, inspirations or criteria or styles such as pointillism, or cardboard cutouts or whatever it is, the end result is a lot closer to what the prompter imagines. Now, the other piece of this, again, compared to Mid Journey specifically, is the fact that Dolly 3 apparently handles text much better. One of the examples they give is a poster that says Explore Venus, which is something that you just can't do right now on Mid Journey, although you can do it with some other models, including those from Stability AI. Anyway, the TLDR for this argument is that if this was the only advance, just a more precise higher-powered image generation model that would still

Starting point is 00:11:59 be significant. However, that's not the only thing about this, and in fact, in some ways, it's overshadowed by where this exists and how people are going to interact with it. Dali3 is built natively into chat GPD, and that means a number of different things. First of all, it means that it's more natively suited to natural language prompting than the sort of prompt engineering that people have to do with tools like Mid Journey. Indeed, OpenAI says that part of what makes Dali 3 so much more expressive and so much better able to handle nuance is the fact that it is natively built on GPT. On the product page, this is one of the biggest highlights that they make. OpenAI writes, modern text to image systems have a tendency to ignore words or descriptions,

Starting point is 00:12:36 forcing users to learn prompt engineering. Dali 3 represents a leap forward in our ability to generate images that exactly adhere to the text you provide. The example they gave is a nighttime illustration of what might be mid-20th century London, in which they point out how a number of different parts of the natural language prompt come to life. A sidewalk bustling with pedestrians enjoying the nightlife, a bustling street under the shine of a full moon. At the corner stall, a young woman with fiery red hair dressed in a signature velvet cloak is haggling with the grumpy old vendor. The grumpy vendor, a talented man, is wearing a sharp suit, sports a noteworthy mustache, and is animatedly conversing on a steampunk telephone. Now, one thing it should be noted is that

Starting point is 00:13:11 right now, all we have in terms of understanding the quality of Dolly 3 are the highly curated and selected examples that OpenAI has given us. So far as I have seen, the company has promised that this will be rolling out into chat GPT Plus and Enterprise in the near future, But so far, I haven't seen anybody that actually has access to it. Which means, of course, that we have to take any big claims with a grain of salt until we can test it ourselves. Still, if Dolly 3 really can handle this level of linguistic nuance when it comes to prompting, it is a total game changer relative to what you get from something like mid-journey. The other piece of that is how ChatGBT-BT might actually be a useful assistant in refining prompts.

Starting point is 00:13:46 Again, OpenAI writes, Dolly 3 is built natively on ChatGBTBT, which lets you use Chat-GBT-BT as a brainstorming partner and refiner of prompts. Just ask ChatGPT what you want to see in anything from a simple sentence to a detailed paragraph. When prompted with an idea, ChatGPT will automatically generate tailored detailed prompts for Dolly 3 that bring your idea to life. If you like a particular image but it's not quite right, you can ask ChatGPT to make tweaks with just a few words. Now, the other important piece of the argument about why it's so transformative to be integrated natively with ChatGPT is of course just distribution. Right now, the biggest player in this text-to-image space is Mid-Jurney, and to use Mid-Journey, you basically have to live inside Discord. That is a huge barrier for normal consumers, and the fact that there are already hundreds of millions

Starting point is 00:14:27 of users of chat GPT that now might just natively have access to Dali, although yes, of course, it's just for plus and enterprise users at first, is obviously a really big deal as well. Now, obviously the other piece of the integration with chatchip T, which I've decided to break out as a third reason that this is a really big deal, is that this is a clear first step towards multimodality. Quantum physicist Kevin Fisher wrote, Dolly 3 is a massive step change improvement to put Dali in chat. A year ago pre-chat GPT,

Starting point is 00:14:53 even I built an alpha that combined chat with images. A small group of people felt this magic together and still reminisce. This is a 100% revolutionary release from OpenAI. Kevin quote tweeted himself from October 19th of last year, when he wrote prompt engineering is not the future, the future looks more like this. In it, he shares an example of this alpha, which has a chat style interface that interacts with images as well.

Starting point is 00:15:15 Dr. Jim Fan from NVIDIA is even more precise in his articulation of what Dolly 3 means in terms of multimodality. He says, I think Dolly 3 is not just a stance against Mid-Jurney, it's actually a sneak peek of the upcoming epic battle of massively multimodal LLMs against DeepMind Gemini. Quote, Dolly 3 is built natively on chat GPT. This is the key phrase. Dali 3's extraordinary language alignment is built on a solid textual GPT foundation. Mid Journey doesn't really have much reasoning brain, which is why so much prompt hacking

Starting point is 00:15:43 is needed. Brain first, pixel second. That's the way to build a strong multimodal AI. So the third reason that this is such a big deal is that it's a first thing. step at a preview of the multimodal future. Now, one thing that has been lurking behind and the fourth reason it's a big deal is its impact on immediate competitors like Mid Journey. Chase Lean tweeted, OpenAI just announced Dolly three, their new AI image generator.

Starting point is 00:16:04 Will it replace Mid Journey? I think it all comes down to two things. One, how good is Chat Chepti at understanding the user and writing prompts for Dolly? Two, how good will Mid Journey version 6 be? As Chase points out, expert mid-jorney users know how to write prompts, but they're concerned about things like image quality, prompt adherence, image resolution, character, consistency, Based on the recent demo, I'd say that Dolly and Mid-Jurney version 5.2 have similar quality. But here's the thing.

Starting point is 00:16:27 Dali is way better at prompt adherence. Unlike Mid-Journey, Dali generates almost everything requested inside the prompt. It can also make some decent text. Combined with GPD, which is amazingly good at writing descriptions, this makes Dali truly formidable. A huge win for Open AI. But Mid-Journey version 6 is on the horizon and is rumored to have much better image quality, prompt adherence, and an upscaler to increase image resolution.

Starting point is 00:16:49 For the past two years, superior image quality has kept Mid Journey in front of stable diffusion. If Mid Journey version 6 is able to keep making the best images, it will continue to being the AI image generator of choice. I think that that's an accurate assessment. I can only speak to myself and the people that I observe, but I make probably 50 to 100 mid-jorney images per day. Some of them are for work,

Starting point is 00:17:08 many of them are just for fun or for creativity or for exploration. I can say beyond a shadow of a doubt, that if Mid Journey version 6 produces what I perceive to be better results and higher quality images than Dolly 3, it's not going to replace Mid Journey at least not fully for me. Convenience is not enough, even if convenience gives Dolly 3 a big edge when it comes to bringing more people into the image generation space in the first place. More than that, I think that obviously this announcement has to light a huge fire under Mid Journey's

Starting point is 00:17:34 butts to advance things more quickly. Nick St. Pierre went viral with a tweet that said simply, Mid Journey needs to get out of Discord ASAP, and that's a pretty obvious one. A native app for Mid Journey is no longer really an option. It's also going to force them to try to reach parity with things like text generation. And so I think if nothing else, you can expect that the next. version of Mid-Journey is going to be a significant amount better, or they risk losing the race entirely. That gets us, however, to our fifth big point, which is maybe the most significant in some ways,

Starting point is 00:18:01 and that's what I'm calling competitive accelerationism. Professor Ethan Mollock writes, it looks like the Gemini quickening has begun. With the upcoming release of the first model to likely beat GPT4, at least temporarily, you can see a burst of announcements, GPT4 multimodal and Dali 3, barred integrations like more. Competition in the space is increasing velocity. Now, of course, to the extent that I just mentioned the positive impacts of that, mid-journey getting better because of that competitive pressure, there are many who are concerned about exactly this type of competitive accelerationism. It was one of the big reasons that Jeffrey Hinton said he left Google,

Starting point is 00:18:34 that the market pressure to compete and move faster was making companies act less safely when it came to AI releases, and physicist and MIT professor Max Tegmark said something very similar today. A Guardian piece came out this morning called AI-focused tech firms locked in race to the bottom, warns MIT professor. physicist Segmark says competition is too intense for tech executives to pause development to consider AI risks. Now, the piece itself wasn't specifically about this release of Dali 3, but more about this broader trend that Professor Mollick was just noticing. You have every big tech company now firing on all cylinders to get ahead, or in the case of open AI to stay ahead, and that works at cross purposes with slow, deliberate, considered AI safety discussions. It even works at cross purposes with preventative mechanisms

Starting point is 00:19:16 such as red teaming. Now, all in all, this is a much lower priority right now for the average person who's interacting with this Dolly 3 news. Like I said, I think people are more excited about this product announcement than anything I've seen in months, maybe since GPT4. But it is the background noise. It is a piece of this discussion that can't be ignored. And I think that Professor Malik is right to identify that the phase that we're entering in which GPT4 becomes no longer inaccessible, but match or exceeded, is one which will have dramatic impacts on the shape of the industry because of this competitive accelerationism. Still, when it comes to Dolly 3 specifically,

Starting point is 00:19:49 it is very, very hard for me not to be just fully excited, ready to dive in. And so October, when they say this will begin rolling out in chat GPT Plus, can't come soon enough. Until then, I will be watching to see what people inside Open AI create, and I will report back my favorite findings. If you want to share your text-to-image images, come join us on the AI breakdown Discord. It's a great place for exactly that sort of conversation.

Starting point is 00:20:12 I just dumped in a bunch of custom magic cards I created as a four example, you can find a link to the Discord at bit.ly slash AI breakdown, and I'll see you there. Thanks as always for listening or watching. Until next time, peace.

The AI Daily Brief: Artificial Intelligence News and Analysis - 5 Reasons DALL-E 3 in ChatGPT is a Huge Deal

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.