The AI Daily Brief: Artificial Intelligence News and Analysis - The 5 Most Exciting AI Tools Launched This Week

Episode Date: June 9, 2023

From text-to-3D avatar to text-to-video to music you create with gestures, these are the most interesting and exciting ai tools that launched or were announced this week.  Before that on the Brief A...I in Instagram. AI in WhatsApp. AI in Facebook. It's all on the way, according to an all-hands meeting this week with Meta CEO Mark Zuckerberg. Additionally ChatGPT comes to iPad, Adobe launched Firefly for Enterprise, and more.   The AI Breakdown helps you understand the most important news and discussions in AI.  Subscribe to The AI Breakdown newsletter: https://theaibreakdown.beehiiv.com/subscribe Subscribe to The AI Breakdown on YouTube: https://www.youtube.com/@TheAIBreakdown Join the community: bit.ly/aibreakdown Learn more: http://breakdown.network/

Transcript
Discussion (0)
Starting point is 00:00:00 Today on the AI breakdown, we're discussing the most exciting and interesting new tools that were launched this week. Before that on the brief, Adobe's new enterprise offering and meta has plans to put AI in, well, everything. The AI breakdown is a daily podcast and video about the most important news and discussions in AI. Like, subscribe and share, and go to Breakdown.network for more information. Welcome back to the AI breakdown brief. All the AI headline news you need in five minutes or less. Yes, I am losing my voice, which is a good reminder that you might want to be really, this instead of hearing it. I have a newsletter that I publish every morning at the AI breakdown.
Starting point is 00:00:37 B-I-I-I-V, which is spelled B-E-E-H-I-V.com. Now, our first story today is that in response to having the wind taken out of their sales a little bit based on Apple's Vision Pro announced earlier this week, Mark Zuckerberg and Meta convened an all-hands meeting for the entire company. Meta has been on the ropes in many ways over the last year. It's had to have its first layoffs everywhere, and its strategy has been seen by some as sputtering. However, there's no denying they've been quietly carving a leadership position when it comes to AI, and in particular, open source AI. It feels like every other day there's some new meta open source research that we're sharing on this show. And at the all hands yesterday, Zuckerberg gave a bit of an idea about
Starting point is 00:01:16 how it all might integrate into their existing products. In short, they're going to put generative AI text image and video generation in everything. So for example, people will be able to potentially text prompt to modify their own photos for Instagram, have AI with different personalities that they can engage with via Messenger and WhatsApp, and they say AI is at the core of their plans for how they imagine the future Metaverse rolling out. Zuckerberg said in a statement to Axios, in the last year, we've seen some really incredible breakthroughs, qualitative breakthroughs on generative AI, and that gives us the opportunity to now go and take that technology, push it forward, and build it into every single one of our products. Another Silicon Valley-based social network that's also
Starting point is 00:01:56 integrating generative AI into its products is LinkedIn. LinkedIn follows Google and meta in putting new AI tools in their advertising suite. Basically, this is a copywriting AI support tool, and so advertisers will be able to put their initial text in a box and use AI to get different suggestions for how it might be improved to be more effective. Speaking of Enterprise AI, Adobe got headlines yesterday by announcing that their Firefly AI suite was coming to the Enterprise. Now, Adobe has been on an absolute AI barn burner recently, and this is no exception. Adobe's Enterprise Firefly integration will allow people who use Photoshop, Illustrator, Express, and Experience Manager to modify photos directly from within those experiences. Now, the big pitch to the enterprise is that Adobe
Starting point is 00:02:38 actually considers commercial viability. And what they mean by that is a couple things. First of all, they've trained their model on their own proprietary suite of stock images, so they're offering that as a better alternative to services out there, which are, for example, under lawsuit for having trained on someone else's images. And the other big piece of news is that they're planning on giving those enterprise customers and indemnification against copyright claims. I think these pretty significant steps suggest just how much interest there is in the enterprise in these new types of generative AI tools. However, it isn't only businesses and business customers that are getting new goodies in the AI realm. ChatchipT is now natively supported on iPad a couple weeks after coming
Starting point is 00:03:14 to iOS in general. In addition to just iPad support, this new version of ChatchipT also has native support for Siri and shortcuts. This has many speculating about linkups between Apple's Siri and open AI, but for now, it's all just rumor and innuendo. What's not rumors and innuendo is concerns about how AI software can be manipulated to give away proprietary data. Recently, researchers at robust intelligence announced that they had found a way to break the guardrails of an Nvidia AI system. Their manipulations got Nvidia, for example, to release personally identifiable information from a database, which is obviously a big concern. As a financial times put it, the ease with which these researchers defeated the safeguards highlights the challenges AI companies face in attempting to
Starting point is 00:03:54 commercialize one of the most promising technologies to emerge from Silicon Valley for years. Another cautionary tale around AI comes from a new paper called GPT detectors are biased against non-native English writers. The study was summed up by Wharton Professor Ethan Malik, who writes, You really, really shouldn't be relying on AI detectors for classroom use. This new paper shows that not only are they very easy to defeat by just prompting a couple of times, but they have insane false positive rates against non-native English speakers. Other reasons detectors don't work, he says. One, they are often trained on GPT 3.5, so GPT4 beats them. Two, even if they alert you to potential AI use, there is no way to see that it is true. Three, students working interactively with
Starting point is 00:04:34 AI defeats test as my class found. There have been a number of high profile stories recently that have made national press around professors or teachers falsely identifying their students as having used chat GPT in assignments or in essays. And while I think that there's effectively going to be a never-ending spigot of capital for AI detection technology, as it seems so important, The point that this paper makes is that right now that technology is not to be trusted. Lastly, today, if we started with the theme of how LLMs and AI are being customized for the enterprise, we close with a different directional trend of AI, which is to have it customized for personal use. Yesterday, Timothy Karambat writes, announcing anything LLM, an open source full-stack app for chatting with anything.
Starting point is 00:05:14 UI for managing documents using OpenAI, Pinecone, and Langchain. He says, first, you can find anything LLM on GitHub. No bulky CPU or GPU is required. Chatting with your documents is the hello world of LLM use cases. Why not make it more accessible? Timothy goes on. What's so special about anything LLM? No crazy system requirements runs fast and passively on your machine. Full data collection tool suite. Collect anything. Entire YouTube channels, subsacks, mediums, get books, and local document processing. Persisted in usable, he says, shut down the app and start it later. All your documents, chats, and more are still present, picking up right where you left off. The database is locally saved on your machine. This is a trend that people like Brian Romley have been talking
Starting point is 00:05:53 a lot about. He shared a clip from Lex Friedman's recent interview with Mark Zuckerberg and said, Private and personal AI is sweeping the world. Mark Zuckerberg admits they are using versions developed by the open source community. Now, one more very cool tool to close this show out from friend of the show, Emmett Homm, who says, the wait is over, train a chatbot on an entire YouTube channel. YouTube to Chatbot is now open source and live on GitHub. Now, obviously, as someone with multiple YouTube channels and who is interested in AI and chatbots, this is a project that I have been following closely and may have been one of the content creators that Emmett references as having talked to him behind the scenes about it. Anyways, a lot of exciting possibilities with the
Starting point is 00:06:30 ability to train chatbots on YouTube channels so excited to see that work. For now, though, guys, that is it for today's AI breakdown brief. If you're enjoying, please like, subscribe and share, and I will be back soon with the main AI breakdown. A quick note on this breakdown, it's a little bit more visual than normal because I'm showing off some of these tools, but I've tried to still make it accessible in podcast form, so just wanted to give you that heads up before you dive in. Today on the AI breakdown, we're talking about the five most exciting and interesting AI tools that were launched this week. At number five, we have something that isn't really launched, but is just at the research stage and has been capturing attention because of this viral Twitter video.
Starting point is 00:07:11 Nate Barcy writes, this is insane. It generates audio based on motion in real time. This is probably the most compelling AI instrument I've recently seen in the audio industry. Now, this research is all up on GitHub. It's co-funded in part by the European Union, and so if you have the inkling, you can go check this out on your own computer. There are a couple reasons that I thought this was interesting. One, this is not the first time we've seen recently over the last few years. People get excited about contactless or touchless musical instruments. Intel demoed something similar at the 2016 Consumer Electronics Show in Vegas, with their curie, which was a button-sized hardware module for wearable devices. And then,
Starting point is 00:08:11 And then of course there's the theramine, which has been around since the 1920s. The way that the theramine works is that you are interacting with electric signals. There are two antennas connected to oscillators, one that controls pitch and one that controls volume. The thermine uses something called heterodyining, which combines the frequencies from the two antennas to create an audible sound. Closer your hand is to the pitch antenna, the higher the note, and the closer it is to the volume antenna, the quieter the sound. Now the theramine has been called one of the most challenging, if not the most challenging instrument in the world. But what I thought when I saw this is that we might be moving into a world where hand gestures are the new interface that we all get used to. It seems far away right now. We're used to pointing and clicking on a mouse, but Apple seems poised to try to disrupt that once again.
Starting point is 00:09:16 With the Apple Vision Pro, people will be doing things like pinching and holding to highlight text on the quote unquote screen that they're viewing, pinching and dragging to scroll around windows in front of them, zooming by pinching your hands together and then pulling apart, and a bunch of others. So could it become more default to use our hands to control every interface? I guess we'll just have to see. Number four on our list of cool new tools this week is Stability AI's clip drop. And for this, we have to have a little bit of background context. Adobe Photoshop's generative fill has given the internet the ability not only to fix blemishes or change things within photos, but also to expand the field of view around them.
Starting point is 00:09:49 This has led to a trend of people exploring what famous works of art would look like in an expanded version. So you have here the Mona Lisa, Vincent Van Gogh's Starry Knight, Edward Hopper's Nighthawks, Botticelli's the birth of Venus, and more. Now, the internet got a little bit prickly about this, but I thought it was all in good fun. Well, Stability AI seems to have agreed with me because they just announced a new feature called Uncrop for their ClipDrop app. They call it a game-changing outpainting technology that generates AI-generated backgrounds to expand any image. So effectively, what Stability is doing here is taking this use case that has become popular
Starting point is 00:10:23 in this new Photoshop tool and turning it into a product all on its own. There are obviously significant UI benefits when you make something like this a core feature and can design around it. So I'm sure we're going to see tons and tons of people playing with ClipDrop's new uncrop feature by stability. Never standing still, though, Adobe is not content to let Stability have all the fun. They also just announced a new update to their Express tool. This is really Adobe's Canva Style Graphics Creator, but now they're integrating generative AI text directly into the tool. So, for example, if you want flowers on your pop-up shop invite, you just write flowers and it comes up. You can also use text instructions for things like modifying the type,
Starting point is 00:11:00 changing sizing, even animating features of what you create. Howard Pinsky points out that there is even a character animator that syncs with your voice. Okay, this is kind of fun. Adobe Express just added a character animator to their quick actions where I can browse through a variety of really fun characters. I can change the background to fit the scene I'm looking for, and then record my voice and even enhance the speech if I don't have access to a good mic. And when I'm done, it's going to sync the audio and animate the character form.
Starting point is 00:11:27 me. I may look like I just stepped in a pile of alien slime, but trust me, I'm excited to be brought to life using Adobe Express. Adobe's Emery Wells writes major day for the Adobe Express team. Our revamped Express eliminates technical hurdles, unlocking creative prowess for all. Now, just to get a sense of how fast Adobe is moving, this came out the same week that Adobe announced their new Firefly integrations for their Enterprise Suite. Those integrations, as I discussed on the brief this morning, include indemnity for enterprises that use Adobe's tools to create images. As a dyed-in-the-WolkANVA user, I am excited to see if they can keep up with what we're seeing from Adobe Express. The character animator that Howard played around with was not, however,
Starting point is 00:12:07 the only character-generation tool we've seen this week. A.K. Halik pointed to new research from style avatar 3D, leveraging image text diffusion models for high-fidelity 3D avatar generation. The summary of the paper says, A novel method for generating high-quality stylized 3D avatars that utilize pre-trained image-text diffusion methods for data generation and a generative adversarial network or GAN-based 3D generation network for training. Another research paper along similar lines was called Headsculp, crafting 3D head avatars with text. The paper says recently text-guided 3D generative models have made remarkable advancements in producing high-quality textures and geometry. However, existing methods still struggle to create high-fidelity 3D head avatars in two aspects. One, they rely mostly on a pre-trained text-to-image diffusion model while missing the necessary 3D awareness and head priors.
Starting point is 00:12:56 This makes them prone to inconsistency and geometric distortions. And two, they fall short in fine-grained editing. This is primarily due to the inherited limitations from pre-trained 2D image diffusion models, which become more pronounced when it comes to 3D head avatars. In response, the research team is introducing a versatile course-defined pipeline dubbed head sculpt to improve on these fronts. But if the research was impressive, so are the applied tools, creating 3D avatars. Joss Singh shared this viral thread about how he created an AI avatar video directly in ChatGPT. This was using the Hey Gen plugin, which is available from
Starting point is 00:13:31 the plugin store with GPT4. All he had to do was provide the gender of the avatar, the text that he wanted the avatar to say, and the title of the video, and this is what came out. Hello, I'm an AI Avatar, created by Jazz Singh using ChatGPT plugins. If you want to know how I was created, be sure to read the thread. Still, if that thread was viral, that wasn't really what got people hyped. What got people really hyped this week was DAZ 3D's text to 3D character engine. So in the video that we're watching, you can see that the demo changes everything from ethnicity to gender to clothing, all using text. And of course, these 3D characters do not have to stay human or real in any meaningful way.
Starting point is 00:14:10 The character engine is called Taffy and DAZ says, We're training our foundational model using a proprietary synthetic dataset comprised of tens of billions of unique characters. meticulously labeled and organized this dataset pulls from our 20-year history in 3D art. Now, we've seen a lot of world-building tools recently. I've talked to you guys before about blockade labs, new skyboxes, through which users can draw the dimensions of the worlds that they want to create, and then use text to describe the environment, which blockade then renders in rich terms.
Starting point is 00:14:38 These 3D characters are, of course, the type of thing that you would put in these environments, and what gets exciting is imagining how many more people are going to be able to easily create these types of 3D world environments and the characters that inhabit them. The storytelling and entertainment possibilities start to get really endless, really fast. And that, of course, brings us to our number one coolest tool release this week. Had to be number one with a bullet, Runway's Gen 2. Runway is a text-to-video leader, and already with Gen 1, they had people salivating at the creative possibilities that were being unleashed. About a month ago, they started rolling out beta access to Gen 2.
Starting point is 00:15:15 but as of this week, Gen 2 is now available to everyone. Now, like Gen 1, the total creation time of any individual video is limited to 4 seconds, and because of that, users have been creating a lot of short videos and movie trailers as their first attempts. Julie W. Design released a video of a bucolic farm setting and writes, There's something about Gen 2 that gets me every time. It fits perfectly into my non-existent knowledge of video while putting it all together in iMovie. Dylan says, Runway Gen 2 text a video just went public and you can try it for free.
Starting point is 00:15:45 Here's how it imagined a robot walking through 19th century Kyoto. Marcel Klimo writes, Lots of room for improvement, but if this goes as fast as Mid-Journey from this time last year, we could be seeing completely believable generated video in just a few months. Ventunort agrees, saying, despite its numerous imperfection, it's clear that the entire audiovisual industry is about to be revolutionized.
Starting point is 00:16:05 Mr. Allen T. created a music video for the fictional band, Metal Henry, and the lovebots. He said, I first generated the images with Mid-Jurney and used that as my base for the scenes in runway. this gave me better control. Moomin is the hardest part and requires some re-rolls. By the way, Alan also generated the music with Google's music L.M. His prompt was an 80-synth rock band with guitars and a keyboard.
Starting point is 00:16:44 And of course, it wouldn't be the internet without someone attempting to recreate the Lord of the Rings with this new AI tool. So obviously we had to do a quick test to close out this video. I'm going to generate a four-second video based on the prompt, a cinematic shot of a podcaster sitting in a forest with their microphone, looking wistfully around as animals watch him podcast. Now I entered that same prompt into Mid Journey to get a reference photo. This was the one that I chose, although there were some other great options as well. You can see in runway gen 2 it allows you to use a reference photo.
Starting point is 00:17:34 Let's generate it from here and see how it goes. And here we have it. Apparently, I'm interviewing a fox bear thing with this strange microphone man set up that I can only assume as a ghost. Anyways, guys, clearly this is not a full experiment video. But I, for one, I'm still incredibly excited to keep. testing things out. That is it for today's AI breakdown. If you're enjoying the show, please like, subscribe, and share. Check out the podcast and the newsletter version. And until next time, peace.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.