The AI Daily Brief: Artificial Intelligence News and Analysis - Meta Launches Midjourney, Dall-E-3 Competitor

Episode Date: December 8, 2023

How does the new tool stand up to other image generators? And what else did Meta AI announce. The day following Google's Gemini was a mixed bag for the company. On the one hand, socials acted betrayed... on discovering the most impressive demo video had sort of been staged. On the other hand, Wall Street drove the stock up 5+% - adding $80B to the company's market cap. Interested in the January AI Education Beta program? Learn more and sign up for the waitlist here - https://bit.ly/aibeta ABOUT THE AI BREAKDOWN The AI Breakdown helps you understand the most important news and discussions in AI.  Subscribe to The AI Breakdown newsletter: https://theaibreakdown.beehiiv.com/subscribe Subscribe to The AI Breakdown on YouTube: https://www.youtube.com/@TheAIBreakdown Join the community: bit.ly/aibreakdown Learn more: http://breakdown.network/

Transcript
Discussion (0)
Starting point is 00:00:01 Today on the AI breakdown, we're talking about meta's new AI announcements, including their Imagine Image Generator. Before that on the brief, more thoughts on Google Gemini. The AI breakdown is a daily podcast and video about the most important news and discussions in AI. Go to Breakdown.netnetwork for more information about our YouTube, or Discord, and our newsletter. Welcome back to the AI Breakdown Brief, all the AI headline news you need in around five minutes.
Starting point is 00:00:30 Well, friends, obviously the big story from this week was the surprise announcement of of Google Gemini. We discussed in that immediate reaction video a little bit about how, while initially everyone's reaction was to be impressed that there was finally a GPT4 level competitor, pretty quickly there started to be some questions. The much-touted MMLU results that showed Gemini not only beating GPT4, but also beating human expertise, weren't really carried out in the same way that tests including GPT4 were carried out, which was a little bit of a wet blanket on that announcement. And then on Thursday, a lot of the chatter on Twitter and in other parts of the internet was about how one of the most shared demo videos from the launch, which
Starting point is 00:01:10 showed a company representative speaking to Gemini as it identified drawings in real time, was basically entirely edited and wasn't really at all what it seemed. The video that I'm referring to is of course the one where a human is drawing something that starts off squiggly lines and then becomes a duck, and Gemini seems to respond in kind, explaining what's going on, and then it moves to inventing a game, then it solves visual puzzles. Well, basically, it didn't happen as it was presented, which made it seem like the person was just speaking to Gemini right there.
Starting point is 00:01:40 Instead, as Bloomberg opinion columnist Parmi Olson put it, Google filmed those hands doing a bunch of things and then showed stills of the footage to Gemini one by one. There was no voice conversation but a text exchange like you'd have with ChatGBT or Bard. Google's video made it look like you could show different things to Gemini Ultra in real time and talk to it. You can't.
Starting point is 00:01:58 The voice you hear in the video is just reading the text prompts. So again, all of this just served to throw a little cold water on this, but ultimately the bigger point still remains that we just don't have ultra yet and until we do, there are going to be major questions. Yet at the same time, even though Google's stock price was initially flat on the announcements, Thursday saw a significant uptick as Wall Street digested the news. Alphabet shares ended 5.3% higher on the day with analysts believing that the new Gemini model could narrow the gap with open AI and by extension, Microsoft. JPMorgan analysts wrote, Google is beginning to address investor concerns around generative AI and the high cost of running Gen AI models through the combination of Gemini's different model sizes. Other analysts noted that the introduction at Gemini comes at a time when some people are more skeptical of OpenAI's models. Wrote McCarrie analysts, Gemini's release comes at an
Starting point is 00:02:49 interesting point in time where OpenAI and chat GPT users have been complaining about how updates to the GPT model family have potentially impacted the quality of its output. If Google is, is shipping a GBT4 beating model, this could help gather user and developer momentum behind Google. Now, staying in Big Tech land for a moment, Amazon CEO Andy Jassy didn't announce anything new, but in an interview with Jim Kramer, he did say that generative AI will, quote, change every customer experience. He said, if you've studied generative AI and you're still scoffing, you're really not paying attention. Generative AI is going to change every customer experience and it's going to make it much more accessible for everyday developers and even business users to use.
Starting point is 00:03:28 So I think there's going to be a lot of societal good. He also liked Amazon's prospects in the AI race saying, We think we have a real opportunity to be a leader there, and we're in the process of building a much more expansive large language model underneath Alexa that will make her both much more knowledgeable and much more conversational. Is this the large Olympus model that has been rumored? We will just have to wait and see. Now, of course, it's not just the big tech companies that are exploring how to use AI,
Starting point is 00:03:52 but the CPG companies and the restaurants. For example, the Verge writes, McDonald's will use Google AI to make sure your fries are fresh or something. The fast food company says it will be applying generative AI to its operations starting in 2024. The TLDR, McDonald's is partnering with Google to bring generative AI to, quote, thousands of stores through hardware and software upgrades. That includes ordering kiosks, the company's mobile app,
Starting point is 00:04:16 and data science that helps them optimize operations. The company claims one of the outcomes of that will be hotter, fresher food, To which the verge says, it's not completely clear what that means, but we can read between the lines. Expect more AI-driven automation at a drive-thru near you in the coming years. The Verge also notes, quote, The McDonald's statement skirts the question of AI replacing human workers, mentioning only that the system should reduce complexity for store crews and that it will, quote, power exciting new experiences for crews and customers.
Starting point is 00:04:44 There's a whiff of robots replacing human workers in the air, but some other research suggests that, while very real, the value proposition of AI for drive-thru is actually driven by real people. writes Bloomberg, checkers and Carl's Jr. are among U.S. fast food chains hailing AI-powered drive-thrus as labor-zapping wizards that speed up service. But a popular provider of these systems recently revealed a crucial part of how it gets so many orders right. Humans. Basically, this company, Presto Automation, has a chatbot that's meant to take orders with very little human intervention. However, disclosures from recent SEC filings say that,
Starting point is 00:05:17 quote, off-site agents working in locales such as the Philippines helped during more than 70% of customer interactions to make sure AI systems don't mess up. This is a humans-in-the-loop approach to AI that I think we're going to see a lot more as companies figure out how to actually bring artificial intelligence into their workflows. Here's a wild one for a guy like me
Starting point is 00:05:35 whose first AOL screen name in 1995 was Kurt C. Freak. The original bassist and founder of Nirvana, Chris Novoselic, recently served as the spokesperson for a Microsoft shareholder initiative asking the Redmond Giant to study and report on the impact of its AI initiatives. Novoselik on behalf of these shareholders argues that Microsoft has rushed generative AI products to market and wasn't giving enough consideration to guardrails. In a recorded video message, he said, when Microsoft released its generative AI powered Bing last February, numerous AI experts and investors expressed concern. Many urged Microsoft to pause and consider all the risks associated with this new
Starting point is 00:06:11 technology so that the company could establish risk mitigation practices. Yet our company raced forward releasing this nascent technology without the appropriate guardrails. Novoselik has apparently been a long-term Microsoft shareholder and added, generative AI is a game changer there's no question, but the rush to market seemingly prioritizes short-term profits over long-term success. Now, outside of the weird pop culture connection of this, I think it's worth pointing out that if one is really interested in slowing down the AI arms race, shareholder activism, i.e. internal pressure that shifts the balance of market pressure could be a very
Starting point is 00:06:46 promising vehicle, especially compared to people just lobbing complaints in from the outside. Moving to the geopolitics of AI for a moment, an update in a story that we've been covering, which is the United Arab Emirates company G42. We recently had that big New York Times feature about them and speculated a little bit about the nature of their relationship with OpenAI, which was just announced in October. And part of the tension was that they were deeply enmeshed with both China and the U.S. However, it appears that they are no longer able to be any sort of neutral middle, and the Financial Times reports, quote, UAE's top AI group vows to phase out Chinese hardware to appease U.S. Abu Dhabi backed G42 says it cannot work with both sides and retain access to
Starting point is 00:07:25 American-made AI chips. Said chief executive Pang Shao, for better or worse as a commercial company, we are in a position where we have to make a choice. We cannot work with both sides. We can't. Now, depending on your take, this could be an example of U.S. policy and pressure actually working, at least in terms of its stated objectives of denying China access to advanced AI infrastructure. It's a really interesting one for sure, and certainly something I'm going to continue watching. Lastly, I want to call out a viral project that has been flying all around Twitter slash X this week called Magic Animate. It's basically a model by which you can take a static image, which can be from the world or can be generated by something like Mid Journey or Dali 3,
Starting point is 00:08:05 reference it against a motion file from meta's dense pose, and get an actual animation of that seed image, that follows the motion from the reference file. I actually just did a tutorial of this for the AI breakdown learning community beta that is happening right now. We're just coming to the close of the first week of this beta where we're doing tutorials, sharing case studies, and participating in follow-up challenges
Starting point is 00:08:26 that get people actually trying tools. Now, just before the main part of this episode, I'm going to share a little bit more about how you can get involved in January, if that's something that's interesting to you. But for now, if you're interested, go on Twitter slash X and see some very cool things indeed.
Starting point is 00:08:40 However, that is going to do it for today's AI breakdown brief. Next up, the main AI breakdown. Hey guys, before we get into the main part of the episode, I wanted to mention just briefly that we are now in the midst. We're actually just closing out the first week of the AI breakdown AI education and learning beta. This is a community of learners where each day I'm dropping in tutorials, case studies, challenges, and a community of people are discussing them, going out and doing those
Starting point is 00:09:06 challenges, in other words, learning AI by doing, and getting a chance to ask questions and talk with people who are experiencing similar problems, taking advantage of similar opportunities, and generally adapting to this new AI-powered world. I'm incredibly encouraged by how it's going so far, and in about a week I'll be opening up registration for next month's second beta test for January. For now, I wanted to let you guys know that that was coming,
Starting point is 00:09:28 and if you are interested in getting on the wait list for that, go to bit.ly-slash-a-i-beta. You'll see the short write-up that I did of December's beta, plus a link to a form where you can sign up for the wait list. I'd love to have you participate in January. So again, that's bit.ly slash AI beta. And now, let's get to the main episode. In what I would consider highly unsurprising news, meta has made a slew of new announcements about
Starting point is 00:09:56 AI features and tools. So first, why do I say it's unsurprising? Well, there's two reasons. One, the chaos and turbulence at OpenAI has provided an opportunity for all of the players who are not Open AI to try to remind you. the world that they exist and exist as alternatives. But two, especially in a week where Google has answered that opportunity by announcing their most advanced model ever in Gemini, it stands to reason that meta would try to get a piece of that narrative action as well. So what did we get?
Starting point is 00:10:24 Well, on December 6th, the company announced that they were testing more than 20 new generative AI tools. They write, to close out the year, we're testing more than 20 new ways generative AI can improve your experiences across Facebook, Instagram, Messenger, and WhatsApp, spanning search, discovery, ads, business messaging, and more. Now, right off the bat, we see something that we've been talking about a lot on the show, which is the fact that we are moving to a period where it's not just about advanced capability announcements, but also about integration into the tools that we're already using. Right up front, they say these 20 generative AI tools are going to improve people's experiences
Starting point is 00:10:57 on the platforms they're already using, Facebook, Instagram, Messenger, and WhatsApp. So what are some of these 20 things that they're testing now? Well, one is an update to meta AI, which is their virtual assistant. They write, we're making it more helpful. helpful with more detailed responses on mobile and more accurate summaries of search results. We've even made it so you're more likely to get a helpful response to a wider range of requests. They write that meta-AI is now helping outside of chats as well. Quote, it's doing some of the heavy lifting behind the scenes to make our product experiences on Facebook
Starting point is 00:11:25 and Instagram more fun and useful than ever before. The large language model technology behind meta-AI is used to give people in various English language markets, options for AI-generated post-comment suggestions, and community chat topics suggestions in groups, serve search results, and even enhanced product copy in shops. Basically what they're saying is that in addition to their assistant chat bot getting better, the same technology, the same LLM that underpins that bot is being used and integrated in effectively everywhere that text lives inside meta experiences, be it groups or the shopping experience or what have you. Now, part of the upgrade with meta AI is a new ability to create images inside chats. Specifically,
Starting point is 00:12:01 they've added a new remix feature that they're calling reimagined. They write, here's how it works in group chat. MetaAI generates and shares the initial image you requested, and then your friend can press and hold on the picture to riff on it with a simple text prompt, and meta AI will generate an entirely new image. Now you can kick images back and forth, having a laugh as you try to one-up each other with increasingly wild ideas. Basically, this is a social extension of an image generation tool. Another new upgrade to the meta-AI assistant is that it can pull in reels, which could be useful for things like planning trips, so you can, for example, see reels of the places that you might want to visit. And they say, this is just the first example of how we'll build even deeper
Starting point is 00:12:36 integrations across our apps to make meta-AI and even more connected and personal assistant over time. And you'll see at this point already that so many of these things that they're experimenting with are not big newsmaking changes, but just different workflows, user experience integrations and updates, and basically bringing AI tools effectively everywhere in the meta suite of apps. Like I said, anywhere that there's language, they're testing recommendations. For example, they talk about how they're helping creators respond to their fans. We want to give creators generative AI tools to help them work more efficiently and connect with more of their community. We're starting to test suggested replies and DMs to help creators engage with their audiences faster and more easily.
Starting point is 00:13:12 Now, for those of you who have been experimenting with the character AIs, no relation of course to character.a. That Facebook slash meta announced earlier this year, the big update here is that for a number of them, they've added long-term memory. Basically, they allow you to go away from a conversation and pick it back up right where you left off. Now, of course, requisitely, there also is an update around responsibility and safety. One of the banner announcements is that they're adding a new watermark to any images that are created with meta-AI's tools. They write, while it's imperceptible to the human eye, the invisible watermark can be detected with a corresponding model. It's resilient to common image manipulations like cropping, color change, screenshots, and more. We aim to bring invisible
Starting point is 00:13:50 watermarking to many of our products with AI generated images in the future. Now more broadly when it comes to safety, they write, we're continuing to invest in red teaming, which has been a part of our culture for years. As a part of that work, we pressure test our generated. AI research and features that use large language models with prompts we expect could generate risky outputs. Recently, we introduced multi-round automatic red teaming or mart, a framework for improving LLM safety that trains an adversarial and target LLM through automatic iterative adversarial red teaming.
Starting point is 00:14:16 We're working on incorporating the MART framework into our AIs to continuously red team and improve safety. So, like I said, big theme here is not crazy banner new. We've got Lama 3 style announcements as hungry as some people are for that. instead it's clearly all about the integration phase of AI and how meta's existing tools are going to find their way to be actually useful across their family of experiences. That said, to the extent that there was something that captured a lot of people's attention, it's that meta has pulled its AI image generation out of the places that it lived inside its apps and created a public space for it at imagine.net.com. So like Mid Journey or Stable Diffusion or Dolly 3, this is an image generation model, although this one has been trained on. Facebook and Instagram photos. We'll see if that makes a difference in terms of what it's good at
Starting point is 00:15:03 in just a moment. As I said, it exists outside of the apps at imagine.meta.com, but it does require a meta account to sign in with. Results so far are somewhat mixed. Venturebeat writes, Venture beats brief unscientific tests show that it only sporadically produced realistic human figures and structures. Often our imagery included strange glitches like melted body parts and scenery. They also point out that a lot of the features that we've gotten used to with other tools like Mid Journey just don't exist. There's no option to remix from Imagine.metta.com, although you can do that in its messenger apps. There's also no way to resize images. And on top of that, of course, there is controversy around how the model was trained. Now, this is built on top of meta's own AI model that's called
Starting point is 00:15:41 EMU, but the issue is, of course, that it was trained on people's Facebook and Instagram images, although excluding things that were shared in private messages. Still, that hasn't stopped some people from saying that this is not how they ever intended for their images to be used, and that this represents a problem. Carla Ortiz writes, outrageous. Meta-released a generative AI model trained on 1.1 billion images from Facebook and Instagram. Copyrighted works, our pictures and our loved ones' pictures all used to train this model. Tech companies are claiming ownership of things they do not own. This must stop. That said, a lot of people also think that this is actually probably Facebook trying to avoid copyright violation
Starting point is 00:16:13 issues, because its terms of service for users who share photos to its site are probably a lot more accommodative of this type of work than just scraping artwork from the public web. Now, it seems to me, and this is the impression that I think a lot of people have, that this initial version of Imagine is really meant to be a competitor to free image generation tools. It's not meant to stand next to Mid Journey's paid model or anything like that, and so that may be a useful way of judging it. It's got a super simple user interface, which would suggest for that as well. The couple of tests I did weren't that bad. I did rugged Santa's cinematic shot Boka Christmas Tree Farm, Misty background, and got some decent results. Blaine Brown
Starting point is 00:16:48 writes, first set of images via meta's emu slash Imagine generator is impressive. Prompt, a baby snow dragon next to a pine tree in a winter wonderland, cinematic. You can see, and this is something that people have noted, that it handles detail pretty well. Chase Lean, who often does comparisons between different image generators, did his own test to compare it to Mid Journey Dolly 3 and Adobe Firefly across 10 image categories. One was a realistic photo. Two were landscape photos. And by the way, if you're watching this, it is the upper left that is meta-imagined each time.
Starting point is 00:17:17 Product photos, text generation. Which, spoiler alert, it wasn't really able to do. vector graphics, pixel art, which meta didn't do at all, interior designs, close-up shots, wildlife photos, and coloring pages for a kid's book. Chase's summary, Meta's AI is good at making realistic photos. This is probably due to their training data. It's a decent free option and slightly better than Dolly 3 in this regard. Dolly 3 usually makes more cartoonish picks. However, it is clearly outmatched by Adobe Firefly and Mid Journey, though both are paid. Dolly 3 also beats it at text generation and the ability to follow instructions inside the prompt.
Starting point is 00:17:50 So, is Imagine Revolutionary? Absolutely not. And in fact, I actually think it points to something different that's happening that is an important trend, which is the commoditization of advanced AI models. This year, things are still happening so fast that there are meaningful differences between how good these various models are. It's not just image generators, but also text generators and LLMs as well. However, at what point do things consolidate around a version that's so good that even if there are advances being made with GPT-5 and GPT6, the vast majority of the world is using, for example,
Starting point is 00:18:23 Dolly 3-Mid Journey 5.2 level quality image generation and GPT4 level text generation. It feels like that's happening fast, and in that world, competition is going to be much more about exactly what meta seems to be trying to compete on, which is integration into other apps and experiences. So, even if you are not overly impressed with the quality of images just yet, still think it's worth keeping an eye on what they're doing, given how much access to people's existing behavior they have and how easily they can bring these tools into those experiences. However, for now, that's going to do it for today's AI breakdown. I appreciate you listening or watching as always.
Starting point is 00:18:57 Until next time, peace.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.