The AI Daily Brief: Artificial Intelligence News and Analysis - The 5 Most Exciting AI Tools Launched This Week
Episode Date: June 9, 2023From text-to-3D avatar to text-to-video to music you create with gestures, these are the most interesting and exciting ai tools that launched or were announced this week. Before that on the Brief A...I in Instagram. AI in WhatsApp. AI in Facebook. It's all on the way, according to an all-hands meeting this week with Meta CEO Mark Zuckerberg. Additionally ChatGPT comes to iPad, Adobe launched Firefly for Enterprise, and more. The AI Breakdown helps you understand the most important news and discussions in AI. Subscribe to The AI Breakdown newsletter: https://theaibreakdown.beehiiv.com/subscribe Subscribe to The AI Breakdown on YouTube: https://www.youtube.com/@TheAIBreakdown Join the community: bit.ly/aibreakdown Learn more: http://breakdown.network/
Transcript
Discussion (0)
Today on the AI breakdown, we're discussing the most exciting and interesting new tools that were launched this week.
Before that on the brief, Adobe's new enterprise offering and meta has plans to put AI in, well, everything.
The AI breakdown is a daily podcast and video about the most important news and discussions in AI.
Like, subscribe and share, and go to Breakdown.network for more information.
Welcome back to the AI breakdown brief.
All the AI headline news you need in five minutes or less.
Yes, I am losing my voice, which is a good reminder that you might want to be really,
this instead of hearing it. I have a newsletter that I publish every morning at the AI breakdown.
B-I-I-I-V, which is spelled B-E-E-H-I-V.com. Now, our first story today is that in response to
having the wind taken out of their sales a little bit based on Apple's Vision Pro announced earlier
this week, Mark Zuckerberg and Meta convened an all-hands meeting for the entire company.
Meta has been on the ropes in many ways over the last year. It's had to have its first
layoffs everywhere, and its strategy has been seen by some as sputtering. However,
there's no denying they've been quietly carving a leadership position when it comes to AI, and in
particular, open source AI. It feels like every other day there's some new meta open source research
that we're sharing on this show. And at the all hands yesterday, Zuckerberg gave a bit of an idea about
how it all might integrate into their existing products. In short, they're going to put generative
AI text image and video generation in everything. So for example, people will be able to potentially
text prompt to modify their own photos for Instagram, have AI with different personalities that they can
engage with via Messenger and WhatsApp, and they say AI is at the core of their plans for how they
imagine the future Metaverse rolling out. Zuckerberg said in a statement to Axios, in the last
year, we've seen some really incredible breakthroughs, qualitative breakthroughs on generative AI,
and that gives us the opportunity to now go and take that technology, push it forward, and build
it into every single one of our products. Another Silicon Valley-based social network that's also
integrating generative AI into its products is LinkedIn. LinkedIn follows Google and meta in putting new
AI tools in their advertising suite. Basically, this is a copywriting AI support tool, and so advertisers
will be able to put their initial text in a box and use AI to get different suggestions for how
it might be improved to be more effective. Speaking of Enterprise AI, Adobe got headlines yesterday
by announcing that their Firefly AI suite was coming to the Enterprise. Now, Adobe has been on an
absolute AI barn burner recently, and this is no exception. Adobe's Enterprise Firefly integration
will allow people who use Photoshop, Illustrator, Express, and Experience Manager to modify
photos directly from within those experiences. Now, the big pitch to the enterprise is that Adobe
actually considers commercial viability. And what they mean by that is a couple things. First of all,
they've trained their model on their own proprietary suite of stock images, so they're offering that
as a better alternative to services out there, which are, for example, under lawsuit for having
trained on someone else's images. And the other big piece of news is that they're planning on giving
those enterprise customers and indemnification against copyright claims. I think these pretty
significant steps suggest just how much interest there is in the enterprise in these new types of
generative AI tools. However, it isn't only businesses and business customers that are getting
new goodies in the AI realm. ChatchipT is now natively supported on iPad a couple weeks after coming
to iOS in general. In addition to just iPad support, this new version of ChatchipT also has native
support for Siri and shortcuts. This has many speculating about linkups between Apple's Siri and
open AI, but for now, it's all just rumor and innuendo. What's not rumors and innuendo is concerns about
how AI software can be manipulated to give away proprietary data. Recently, researchers at robust
intelligence announced that they had found a way to break the guardrails of an Nvidia AI system.
Their manipulations got Nvidia, for example, to release personally identifiable information from a
database, which is obviously a big concern. As a financial times put it, the ease with which these
researchers defeated the safeguards highlights the challenges AI companies face in attempting to
commercialize one of the most promising technologies to emerge from Silicon Valley for years.
Another cautionary tale around AI comes from a new paper called GPT detectors are biased against non-native
English writers. The study was summed up by Wharton Professor Ethan Malik, who writes,
You really, really shouldn't be relying on AI detectors for classroom use. This new paper shows that
not only are they very easy to defeat by just prompting a couple of times, but they have insane
false positive rates against non-native English speakers. Other reasons detectors don't
work, he says. One, they are often trained on GPT 3.5, so GPT4 beats them. Two, even if they alert you to
potential AI use, there is no way to see that it is true. Three, students working interactively with
AI defeats test as my class found. There have been a number of high profile stories recently
that have made national press around professors or teachers falsely identifying their students
as having used chat GPT in assignments or in essays. And while I think that there's effectively
going to be a never-ending spigot of capital for AI detection technology, as it seems so important,
The point that this paper makes is that right now that technology is not to be trusted.
Lastly, today, if we started with the theme of how LLMs and AI are being customized for the enterprise,
we close with a different directional trend of AI, which is to have it customized for personal use.
Yesterday, Timothy Karambat writes, announcing anything LLM, an open source full-stack app for chatting with anything.
UI for managing documents using OpenAI, Pinecone, and Langchain.
He says, first, you can find anything LLM on GitHub.
No bulky CPU or GPU is required.
Chatting with your documents is the hello world of LLM use cases. Why not make it more accessible?
Timothy goes on. What's so special about anything LLM? No crazy system requirements runs fast and passively on your machine.
Full data collection tool suite. Collect anything. Entire YouTube channels, subsacks, mediums, get books, and local document processing.
Persisted in usable, he says, shut down the app and start it later. All your documents, chats, and more are still present, picking up right where you left off.
The database is locally saved on your machine. This is a trend that people like Brian Romley have been talking
a lot about. He shared a clip from Lex Friedman's recent interview with Mark Zuckerberg and said,
Private and personal AI is sweeping the world. Mark Zuckerberg admits they are using versions
developed by the open source community. Now, one more very cool tool to close this show out from
friend of the show, Emmett Homm, who says, the wait is over, train a chatbot on an entire
YouTube channel. YouTube to Chatbot is now open source and live on GitHub. Now, obviously,
as someone with multiple YouTube channels and who is interested in AI and chatbots, this is a project
that I have been following closely and may have been one of the content creators that Emmett references
as having talked to him behind the scenes about it. Anyways, a lot of exciting possibilities with the
ability to train chatbots on YouTube channels so excited to see that work. For now, though, guys,
that is it for today's AI breakdown brief. If you're enjoying, please like, subscribe and share,
and I will be back soon with the main AI breakdown. A quick note on this breakdown, it's a little
bit more visual than normal because I'm showing off some of these tools, but I've tried to still make it
accessible in podcast form, so just wanted to give you that heads up before you dive in.
Today on the AI breakdown, we're talking about the five most exciting and interesting AI tools
that were launched this week. At number five, we have something that isn't really launched,
but is just at the research stage and has been capturing attention because of this viral Twitter video.
Nate Barcy writes, this is insane. It generates audio based on motion in real time.
This is probably the most compelling AI instrument I've recently seen in the audio
industry. Now, this research is all up on GitHub. It's co-funded in part by the European Union,
and so if you have the inkling, you can go check this out on your own computer. There are a couple
reasons that I thought this was interesting. One, this is not the first time we've seen recently
over the last few years. People get excited about contactless or touchless musical instruments.
Intel demoed something similar at the 2016 Consumer Electronics Show in Vegas, with their
curie, which was a button-sized hardware module for wearable devices. And then,
And then of course there's the theramine, which has been around since the 1920s.
The way that the theramine works is that you are interacting with electric signals.
There are two antennas connected to oscillators, one that controls pitch and one that controls volume.
The thermine uses something called heterodyining, which combines the frequencies from the two antennas to create an audible sound.
Closer your hand is to the pitch antenna, the higher the note, and the closer it is to the volume antenna, the quieter the sound.
Now the theramine has been called one of the most challenging, if not the most challenging instrument in the world.
But what I thought when I saw this is that we might be moving into a world where hand gestures are the new interface that we all get used to.
It seems far away right now. We're used to pointing and clicking on a mouse, but Apple seems poised to try to disrupt that once again.
With the Apple Vision Pro, people will be doing things like pinching and holding to highlight text on the quote unquote screen that they're viewing,
pinching and dragging to scroll around windows in front of them, zooming by pinching your hands together and then pulling apart, and a bunch of others.
So could it become more default to use our hands to control every interface?
I guess we'll just have to see.
Number four on our list of cool new tools this week is Stability AI's clip drop.
And for this, we have to have a little bit of background context.
Adobe Photoshop's generative fill has given the internet the ability not only to fix blemishes or change things within photos,
but also to expand the field of view around them.
This has led to a trend of people exploring what famous works of art would look like in an expanded version.
So you have here the Mona Lisa, Vincent Van Gogh's Starry Knight,
Edward Hopper's Nighthawks, Botticelli's the birth of Venus, and more.
Now, the internet got a little bit prickly about this, but I thought it was all in good fun.
Well, Stability AI seems to have agreed with me because they just announced a new feature called Uncrop for their
ClipDrop app.
They call it a game-changing outpainting technology that generates AI-generated backgrounds to expand any image.
So effectively, what Stability is doing here is taking this use case that has become popular
in this new Photoshop tool and turning it into a product all on its own.
There are obviously significant UI benefits when you make something like this a core feature and can design around it.
So I'm sure we're going to see tons and tons of people playing with ClipDrop's new uncrop feature by stability.
Never standing still, though, Adobe is not content to let Stability have all the fun.
They also just announced a new update to their Express tool.
This is really Adobe's Canva Style Graphics Creator, but now they're integrating generative AI text directly into the tool.
So, for example, if you want flowers on your pop-up shop invite, you just write flowers and it comes up.
You can also use text instructions for things like modifying the type,
changing sizing, even animating features of what you create.
Howard Pinsky points out that there is even a character animator that syncs with your voice.
Okay, this is kind of fun.
Adobe Express just added a character animator to their quick actions
where I can browse through a variety of really fun characters.
I can change the background to fit the scene I'm looking for,
and then record my voice and even enhance the speech if I don't have access to a good mic.
And when I'm done, it's going to sync the audio and animate the character form.
me. I may look like I just stepped in a pile of alien slime, but trust me, I'm excited to be
brought to life using Adobe Express. Adobe's Emery Wells writes major day for the Adobe Express team.
Our revamped Express eliminates technical hurdles, unlocking creative prowess for all.
Now, just to get a sense of how fast Adobe is moving, this came out the same week that Adobe announced
their new Firefly integrations for their Enterprise Suite. Those integrations, as I discussed on the
brief this morning, include indemnity for enterprises that use Adobe's tools to create
images. As a dyed-in-the-WolkANVA user, I am excited to see if they can keep up with what we're
seeing from Adobe Express. The character animator that Howard played around with was not, however,
the only character-generation tool we've seen this week. A.K. Halik pointed to new research from
style avatar 3D, leveraging image text diffusion models for high-fidelity 3D avatar generation.
The summary of the paper says,
A novel method for generating high-quality stylized 3D avatars that utilize pre-trained image-text diffusion methods for data generation and a generative adversarial network or GAN-based 3D generation network for training.
Another research paper along similar lines was called Headsculp, crafting 3D head avatars with text.
The paper says recently text-guided 3D generative models have made remarkable advancements in producing high-quality textures and geometry.
However, existing methods still struggle to create high-fidelity 3D head avatars in two aspects.
One, they rely mostly on a pre-trained text-to-image diffusion model while missing the necessary 3D awareness and head priors.
This makes them prone to inconsistency and geometric distortions.
And two, they fall short in fine-grained editing.
This is primarily due to the inherited limitations from pre-trained 2D image diffusion models,
which become more pronounced when it comes to 3D head avatars.
In response, the research team is introducing a versatile course-defined pipeline dubbed head sculpt to improve on these fronts.
But if the research was impressive, so are the applied tools,
creating 3D avatars. Joss Singh shared this viral thread about how he created an AI
avatar video directly in ChatGPT. This was using the Hey Gen plugin, which is available from
the plugin store with GPT4. All he had to do was provide the gender of the avatar, the text
that he wanted the avatar to say, and the title of the video, and this is what came out.
Hello, I'm an AI Avatar, created by Jazz Singh using ChatGPT plugins. If you want to know
how I was created, be sure to read the thread.
Still, if that thread was viral, that wasn't really what got people hyped.
What got people really hyped this week was DAZ 3D's text to 3D character engine.
So in the video that we're watching, you can see that the demo changes everything from ethnicity to gender to clothing, all using text.
And of course, these 3D characters do not have to stay human or real in any meaningful way.
The character engine is called Taffy and DAZ says,
We're training our foundational model using a proprietary synthetic dataset comprised of tens of billions of unique characters.
meticulously labeled and organized this dataset pulls from our 20-year history in 3D art.
Now, we've seen a lot of world-building tools recently.
I've talked to you guys before about blockade labs, new skyboxes,
through which users can draw the dimensions of the worlds that they want to create,
and then use text to describe the environment,
which blockade then renders in rich terms.
These 3D characters are, of course, the type of thing that you would put in these environments,
and what gets exciting is imagining how many more people are going to be able to easily create
these types of 3D world environments and the characters that inhabit them.
The storytelling and entertainment possibilities start to get really endless, really fast.
And that, of course, brings us to our number one coolest tool release this week.
Had to be number one with a bullet, Runway's Gen 2.
Runway is a text-to-video leader, and already with Gen 1, they had people salivating at the creative possibilities that were being unleashed.
About a month ago, they started rolling out beta access to Gen 2.
but as of this week, Gen 2 is now available to everyone.
Now, like Gen 1, the total creation time of any individual video is limited to 4 seconds,
and because of that, users have been creating a lot of short videos and movie trailers as their first attempts.
Julie W. Design released a video of a bucolic farm setting and writes,
There's something about Gen 2 that gets me every time.
It fits perfectly into my non-existent knowledge of video while putting it all together in iMovie.
Dylan says,
Runway Gen 2 text a video just went public and you can try it for free.
Here's how it imagined a robot walking through 19th century Kyoto.
Marcel Klimo writes,
Lots of room for improvement,
but if this goes as fast as Mid-Journey from this time last year,
we could be seeing completely believable generated video in just a few months.
Ventunort agrees, saying,
despite its numerous imperfection,
it's clear that the entire audiovisual industry is about to be revolutionized.
Mr. Allen T. created a music video for the fictional band,
Metal Henry, and the lovebots.
He said, I first generated the images with Mid-Jurney
and used that as my base for the scenes in runway.
this gave me better control.
Moomin is the hardest part and requires some re-rolls.
By the way, Alan also generated the music with Google's music L.M.
His prompt was an 80-synth rock band with guitars and a keyboard.
And of course, it wouldn't be the internet without someone attempting to recreate the Lord of the Rings with this new AI tool.
So obviously we had to do a quick test to close out this video.
I'm going to generate a four-second video based on the prompt,
a cinematic shot of a podcaster sitting in a forest with their microphone,
looking wistfully around as animals watch him podcast.
Now I entered that same prompt into Mid Journey to get a reference photo.
This was the one that I chose, although there were some other great options as well.
You can see in runway gen 2 it allows you to use a reference photo.
Let's generate it from here and see how it goes.
And here we have it.
Apparently, I'm interviewing a fox bear thing with this strange microphone man set up that I can only assume as a ghost.
Anyways, guys, clearly this is not a full experiment video.
But I, for one, I'm still incredibly excited to keep.
testing things out. That is it for today's AI breakdown. If you're enjoying the show,
please like, subscribe, and share. Check out the podcast and the newsletter version. And until
next time, peace.
