Everyday AI Podcast – An AI and ChatGPT Podcast - EP 256: Microsoft's VASA-1 AI Deepfake: So good it's dangerous?

Episode Date: April 23, 2024

Is this AI tool too good to be safe? Maybe.  We'll be taking a look at Microsoft's impressive VASA-1 deepfake cloning technology. We'll answer the important questions:↳ What is this... technology? ↳ Why do we need it? ↳ Is it so good that it's dangerous? Newsletter: Sign up for our free daily newsletterMore on this Episode: Episode PageJoin the discussion: Ask Jordan questions on Microsoft VASA-1Related Episodes:Ep 211: OpenAI’s Sora – The larger impact that no one’s talking aboutEp 157: Future of AI Video – Pika Labs 1.0, Runway updates, Meta Emu and moreUpcoming Episodes: Check out the upcoming Everyday AI Livestream lineupWebsite: YourEverydayAI.comEmail The Show: info@youreverydayai.comConnect with Jordan on LinkedInTimestamps:01:20 Daily AI news05:15 Is VASA-1 dangerous?09:12 Potential dangers of AI outweigh potential benefits.12:14 Voice inflection changes, impressive text to speech.15:29 Generative content creation: impressive, concerning, and exciting.18:54 Quickly scalable technology opens exciting possibilities but troubling.21:23 Real-time face swapping with surprising computing power.24:35 Realistic motion control, capturing human-like tendencies.29:26 Use digital avatars for efficient training purposes.30:16 Microsoft research paper VASA 1 creates human-like avatars.34:20 Discussing generative AI's potential benefits versus risks.Topics Covered in This Episode:1. Explanation of VASA-1's capabilities2. Concerns around Microsoft's VASA-13. Discussion on deepfakes4. Potential Risks and Ethical Considerations5. Potential Benefits and Positive Use CasesKeywords:Microsoft's VASA 1, deep fake model, Everyday AI, livestream podcast, AI news, Adobe, Photoshop, Google, DeepMind, Microsoft PHY 3 mini model, OpenAI's SoRUP, AI deepfakes, talking head images, Jordan Wilson, misinformation, disinformation, generative AI, AI's impact on jobs, lifelike talking faces, digital twin technology, AI avatar, AI-generated speech, Mona Lisa, Baidu's Emo, Synthesia, HeyGen, Hour 1, DID platform, training, personalized learningSend Everyday AI and Jordan a text message. (We can't reply back unless you leave contact info) Start Here ▶️Not sure where to start when it comes to AI? Start with our Start Here Series. You can listen to the first drop -- Episode 691 -- or get free access to our Inner Cricle community and all episodes: StartHereSeries.com Also, here's a link to the entire series on a Spotify playlist. 

Transcript
Discussion (0)
Starting point is 00:00:00 This is the Everyday AI Show, the Everyday Podcast where we simplify AI and bring its power to your fingertips. Listen daily for practical advice to boost your career, business, and everyday life. Meet Firefly AI Assistant, now live in Adobe Firefly, the All In One Creative AI Studio. Just describe what you want to create and the assistant handles the rest, orchestrating multi-step workflows across Photoshop, Premiere Express, and more in one conversational interface. You direct the outcome. The assistant accelerates execution. Is Microsoft's VASA 1 deep fake model so good that it's dangerous?
Starting point is 00:00:53 Do we need this technology? Should we be avoiding it? Or should we actually be rushing toward its development and adaption? We're going to be talking about that today and more on everyday AI. What's going on, y'all? My name is Jordan Wilson. I'm the host of Everyday AI, and we are your daily guide to learning and leveraging Gen. AI to grow your company and to grow your careers.
Starting point is 00:01:21 So yes, we do this every day. It is live. It is unscripted. And we are a daily live stream podcast and free daily newsletter, helping us all. So we're going to get to that in just a second. But whether you're driving on the car or joining us live, make sure if you haven't already go to Your EverydayAI.com and sign up for our free daily newsletter. And on our website, it's like a free generative AI university.
Starting point is 00:01:43 So hundreds, literally, a back episode that you can go and watch and learn anything that you want. All right. So before we talk about Microsoft's new VASA 1 deep fake AI technology, and I've got some takes on this, y, y'all. But before we get into it, let's start as we do every single day with the AI news. Some big stories today. So Adobe has introduced some Gen AI upgrades to Photoshop with its new Firefly V3 Ups. So Adobe has unveiled some new generative AI upgrades to Photoshop, including the new Generate Image feature powered by the new Firefly Image 3 AI model.
Starting point is 00:02:21 So users can generate images directly within Photoshop by typing a text prompt or selecting from preset options. So the addition of Generate Image aims to help new users overcome that empty page feeling and unleash their creativity. Adobe also enhanced the generative fill feature, allowing users to add a reference image super helpful for guided image generation. So a lot of these kind of quote unquote newer features that we were seeing in Mid Journey are now making their way to Adobe's Adobe Firefly.
Starting point is 00:02:53 And all of these new features are available in the latest Photoshop beta app for desktop at $2.99 a month. Very specific price there, Adobe. So our next piece of AI news, we got all heavy hitters today. So we go from Adobe to Google. Google has consolidated their. AI teams under DeepMine to strengthen their AI portfolio. So Google is consolidating its teams focusing on AI model development across research and deep mind divisions.
Starting point is 00:03:22 So responsible AI teams emphasizing safe AI development are being relocated from research, from Google research to Google DeepMind to enhance proximity to AI model building and scaling. So this move from Google pretty big, actually, but it comes amidst increasing global concerns about AI safety and calls for technology regulation. So not just that, but we also just saw about a month ago, Microsoft kind of create its new division, Microsoft AI, and kind of took a lot of the AI development and put it under that arm. So we're seeing a similar approach from Google here,
Starting point is 00:03:58 which I think is smart, right? DeepMind is, I think, one of the leaders in the world when it comes to ability. So I think it's great that Google is moving its entire AI efforts under DeepMind. mind. Last but not least, we go from Adobe to Google to Microsoft. So Microsoft has introduced its Phi-3 mini model, a new small model that can run locally on phones. So apparently, Lama's rain didn't last too long, but Microsoft has launched the first of three small AI models, Phi. I think it's Phi is it fire pie. Does anyone know? But I believe it's Phi-3 Mini that are meant to be more affordable and efficient for personal devices.
Starting point is 00:04:39 So it performs similarly to larger models, but with fewer parameters and is trained using a curriculum strategy. So Microsoft did emphasize the quality of its training input versus quantity. So just much higher quality, which is, you know, allowing it to be a much smaller model with still very impressive benchmarks. So the company also plans to release 53 small and 5.3 medium with 7 billion parameters. and 14 billion parameters, respectively. So the mini model is 3.8 billion parameters,
Starting point is 00:05:13 and it is reportedly already outperforming Meta's Lama 3,8 billion. So that's important to know there, a 3.8 billion parameter model outperforming meta's shiny new 8 billion parameter as well as reportedly out benchmarking Google's Gemma 7B and Mistral 7B models in the MMLU benchmark. So wow, a lot of AI news for today. We're going to be breaking down those stories and a lot more depth in our newsletter. So make sure you go to your everyday AI.com. So let's get to the topic of today.
Starting point is 00:05:48 Today's hot take Tuesday is Microsoft's Vesa 1 DeepFake AI tech so good that it's dangerous. So I'm going to start and just give you the answer because maybe you don't have time. Yes, it is too good for public consumption right now. I need to tell people this. One thing I do hear at everyday AI is I look at literally every single, almost every single piece of major new AI technology. I've used hundreds of pieces of AI software over the years. And this by far, the VASA one is by far one of the most impressive that I've ever seen,
Starting point is 00:06:23 compare it up there with probably Open AI SORA. And we'll just put everything under one roof, but just kind of this under one roof, but I would say the emergence of internet's connected large language models, right? So those are the three things that I've been most impressed with. So the internet connectivity of large language models, Open AI Sora and this new Microsoft BASA.
Starting point is 00:06:47 So it is not available yet. All right. So I want to, we're going to take a close look. We're going to do a breakdown. And I'm also going to tell you seven things that you need to know about this model. but just as a reminder, this is for you all, right? So in our newsletter yesterday, I asked what you guys wanted for today's Hot Take Tuesday.
Starting point is 00:07:06 You all said this. But I'm also curious, you know, for our live stream audience, you know, are you worried about AI deepfakes? Let me know. Or also, if you're listening on the podcast, I always love reading emails. I keep, you know, our email in there, my LinkedIn. So connect with me. But I want to know from you, are you kind of, are you not very worried? Are you more excited? Are you kind of worried, kind of excited? Are you more worried than excited? Or are you just pretty worried, right? So I'm curious where our live stream audience stands on this, right? If I had to, if I had to choose, I would say I'm probably see. I'm probably more worried than excited overall. Don't worry. We're going to break this down, but I'm really curious where our live stream audience. It seems like a lot of people are kind of worried, but kind of excited.
Starting point is 00:07:58 you know, which is interesting. And hey, Douglas, I love this. Douglas is currently in the air. Douglas, you might be the, I guess, the first person joining everyday AI from tens of thousands of feet in the air. So thanks for that. So, you know what Michael said here, and we're going to get into this, he said, it's already difficult to tell what is real and what's not, even with this knowledge of
Starting point is 00:08:22 the tools. Absolutely. Michael, I think that's a great point. And that's something that we're going to be talking about here on today's show. So let's just go into it. Let me just tell you exactly what this new VASA model is. And again, if someone, some of my friends from Microsoft, tell me, is it VASA or VASA, right? That's the only thing with all these models that get released.
Starting point is 00:08:43 Everyone's, you know, pronouncing them differently until, you know, someone from Microsoft goes on an interview and tells us all. But we'll just call it VASA for now. So here is what VASA is. And then we're going to give you all a live look and a live listen for. for our podcast audience as well. So essentially, it is a deep fake technology, right? Whether you want to call it that or not, this is Hot Take Tuesday, I'm saying it like it is.
Starting point is 00:09:11 This is a deep fake technology. There's a big difference between a digital twin or a digital clone or a, you know, AI avatar versus deep fake technology, right? So essentially what this boils down to is this new VASA 1 research paper from Microsoft. Again, it's not public, it's not out, right? But you can go look at the results, but it allows you to take any image and a voice and create a talking head, more or less, right? There's obviously great applications, which we're going to be talking about. But I'd say right now that danger of this far outweighs the potential benefits, right?
Starting point is 00:09:51 This is one of those AI tools where I'm like, okay, is this a solution? quote unquote, looking for a problem. Maybe it is, maybe it's not. And again, I think there's great, there's great positives. There's great plus sides to this. But the biggest difference between deep fake technology and, you know, digital twin or AI avatar technology is its use cases, right? And for the most part, and we're going to talk about this,
Starting point is 00:10:21 but there's great kind of digital twin or, you know, AI avatar platforms out there. but they don't wield this much power, right, where you can literally take any face and make it say anything you want in a very realistic voice with eerily similar movements to a human being, right? So let's just, you know, instead of me trying to spend, you know, another 10 minutes describing this, let's just go ahead. Let's take a look at this technology in action.
Starting point is 00:10:53 Okay. So if you're on the podcast now, so what we're doing here is we're going through the research paper. So we share this yesterday actually in our in our AI News That Matters Monday recap as well as I think last week. So if you haven't taken a look at this yet, I encourage you to do so. All right. So I'm just going to go ahead. I'm going to play one or two samples. And again, for our podcast audience, I'm going to try to do my best to explain what's going on here. But again, as a reminder, all this is is a single image, a single audio clip, and then the user, again, this isn't public, but then the user has control over a lot of different things, what Microsoft is calling control signals. All right. So let's just go ahead.
Starting point is 00:11:40 We're going to watch maybe 10 second clips of a couple of these. So here is the first one. So let's just go ahead, take a watch, take a listen. If you plan to go for a run and you don't have enough time to do a full run, do part of a run. If you plan to go to the gym today, but you don't have the full hour that you normally work out, do some push-ups.
Starting point is 00:12:02 The crazy thing about that one is it looks like it is and maybe intentionally adding this like static-key background noise, which makes it seem even more human and realistic because not all of these voices have that. All right, let's go ahead and watch and listen to a couple more. Again, you know, if you are joining on the podcast, it shows you the single image that this was created with, And then you can click play and watch and listen.
Starting point is 00:12:24 And all of this is generated through this VASA1 model. Surprises me still. I ran it on someone just last night. It was fascinating. You know, she had complained of shoulder like pain in her arm. It was excruciating. Do you hear that? The tone changes, the hesitation, right?
Starting point is 00:12:51 almost like sometimes I know I stutter on the show or I, you know, kind of mutter and, you know, go off on these little side tangents like they're doing this as well, which is crazy. Just the varied inflection in the voice, the cadence changes. So number one, even if this was a text to speech software, I'd be like, whoa, okay, that's pretty impressive, right? Like we talk like 11 labs quality already, maybe even better off the bat, which is pretty impressive. So now let's just listen to one more.
Starting point is 00:13:26 And then I'm going to, I want to know from you all. What are your thoughts on this, right? And if you do have questions, please get them in. But let's just go ahead and watch and listen to one more example. And then I'm going to show you some of these advanced capabilities that I think actually make this a little scary. All right. Let's go ahead and listen to this one.
Starting point is 00:13:46 But you can imagine I have a lot of questions. So I'd love to begin with you firstly, just because I read that you started out in advertising, and now you run a wellness business. All right. So yeah, you're kind of seeing where this is going, right, with that example there. Some more on that in a second. Okay. So now I want to talk a little bit about some of these additional features.
Starting point is 00:14:11 So this one here, and for our podcast audience, essentially you can set the eyes gaze, the eye gaze in different directions. So you have, you know, there's going to be four kind of videos playing at once with only one piece of audio, with the eyes going in different ways. Ready? I would say that we as readers are not meant to look at him in any other way, but with disdain. Okay. That's extremely impressive. One voice. And this, I cannot emphasize enough for our podcast audience. Check your show notes. You can come back and watch this on LinkedIn or YouTube or wherever, you have to see this. If you haven't seen it yet, you need to understand the quality. I cannot emphasize enough. This looks extremely human. All right. So now we have
Starting point is 00:15:00 one more example. And then this is showing kind of different sizes of the head, right? So you can have kind of a super close up or something that's a little more wide. So here we go with this one. But you can imagine I have a lot of questions. So I'd love to begin with you. And then you can see, obviously, different voices for the same image, right? Different languages, right? So now let's go. And this is probably the one that's, you know, kind of taken off and gone viral on social media.
Starting point is 00:15:35 But this is also important to note about the training data and the training set. So I kind of wish Microsoft told us a little bit more about what it trained on. However, it did say that a lot of these capabilities are on obviously generative, right? They're generative. So it's not like, oh, it's been trained on, you know, one million fake faces. No, this is generative. So you can create obviously faces and videos that obviously did not exist in the training set. So this is some, some.
Starting point is 00:16:10 I'd say that's a worrying piece of it, but so impressive, right? I keep personally being torn between this is extremely troublesome versus this opens up so many doors and is extremely exciting. But let's just go ahead. So this is probably the one that's kind of taken over the Internet because it's also works of arts. So yes, you know, the examples that I've played so far look extremely realistic, extremely human. And we're going to break this down in a more detailed level. But here we have the Mona Lisa singing and rapping. Let's take a listen, take a watch.
Starting point is 00:16:48 Yo, I'm a paparazzi. I don't play no yacht. See, I go pop, pop, pop, pop, pop, pop my cameras up your crotch. See, I tell the truth from what I see and sell it to Perez Hilty. Don't call me scurzy making money. That's my job, celebrity. What hell no, I'm not needy. I'm legit.
Starting point is 00:17:05 Wild. It's wild, right? I can't explain the level of detail in the faces, right? I've watched some of these clips over and over and over. As someone that's on camera every single day, I see this. I see the power of this. And this is something that, you know, and we're going to draw the line here between deep fakes and, you know, digital twins or AI avatars. And, and, the level of detail in the face, right, eyes going in and out, eyebrows rising, you know, wrinkles, right, wrinkles in your crows feet area, right, in the corner of your eyes. You know, all of these traits that we feel are uniquely human and have never been touched by current AI technology no longer, right? which is what makes this both so scary and so incredibly promising, if used correctly. All right, let's just look at, we're going to look at just two more quick examples here. So this one goes to show. So we have the same voice in three completely different faces, right? So this is where you start, maybe you saw the first ones where it's just one person talking
Starting point is 00:18:23 at a time and you're like, oh, this is great. You know, I don't see really a problem with this until you see, okay, here's three people. you know, three unique faces and one piece of audio. We'll prevent those cavities from getting worse and prevent new cavities. Just because you treat cavities, it doesn't mean they can't get cavities in any other tooth. So that's wild. So in this one you have, you know, three different women, three different ethnicity, three different ages, you know, talking about cavities. Maybe I should listen to them and then I'd be able to spend more time on everyday AI and less time at the dentist.
Starting point is 00:18:59 But still, this is one of those where I think most people will see that and get worried. This is one where the marketing and the advertising part of my brain is just going off, right? Because you're like, oh, wow, now you can quickly A, B test, right? You can quickly, you know, put a bunch of these videos on your website and see which one's resonating the most with your target audience, right? And to be able to do that at quickly and at scale is wild, is wildly, exciting about the type of possibilities that this opens up. But obviously, it's troubling. It's troubling, right?
Starting point is 00:19:37 So the last one that we're going to look at here, and then I'm going to tell you seven things that you need to know and get to your questions. So if you do have questions, please drop them in now. So the last one that I'm going to show here for our podcast audience, this essentially just gives you different controls. So in this video, it's going to show the different controls that you have. So it's things that if you are a video editor, you're probably familiar with these.
Starting point is 00:20:03 So things like pitch and roll and X, X, X, S and Y access, gaze, right? So it's not just random, right? You have so much fine-tuned control. And again, all it takes is a single image, a single piece of audio, and you have all of these controls. So when I hit play here, you're going to see, you know, presumably this is a researcher at Microsoft who's, you know, kind of recording their screen as they're doing this. But you can have such fine-tuned control over this, you know, deep fake clone here,
Starting point is 00:20:39 changing every aspect of it. So let's take a watch and a listen. And I might narrate on this one as well or hit pause. Adobe just introduced an entirely new way to create, bringing the power and precision of its creative suite into one conversational experience. Meet Firefly AI Assistant, now live in the. Adobe Firefly app, the all-in-one creative AI studio. Powered by Adobe's creative agent, Firefly AI assistant lets you start with your vision, just describe what you want, and shape the
Starting point is 00:21:13 outcome as it takes form with the assistant. The assistant orchestrates multi-step workflows, drawing on 60 plus pro-grade tools across Adobe Creative Cloud apps, including Photoshop, Illustrator, Premiere, Lightroom Express, and more to help bring your ideas to life. You can also get started with creative skills, a growing library of pre-built workflows for common creative tasks, like batch editing photos, creating mood boards, portrait retouching, and creating social variations. Every step the assistant takes is visible so you can refine, redirect, or take over at any time. You stay in the driver's seat as the creative director. Adobe Firefly AI assistant now in public beta.
Starting point is 00:21:55 See it today at firefly.adopi.com. I decided to focus all my tension, all my time on listening. So instead of doing something else, I just listen. All right there, we switched to now we went from a presumably female speaking to now, presumably a male speaking, even though it sounds like a female, right? So just with the click of a button, didn't have to regenerate, didn't have to rebuffer. That's another thing we're going to talk about, is the latency reportedly on Basa 1 is amazing in near real time. And listened and listened. Because I'm a true believer that if you're really bad at something like listening, for example, it only shows you that, hey, you have to practice.
Starting point is 00:22:45 Okay. So now what's talking, now what's happening is someone just put in a new kind of script, right? And I believe that they're going to show this generating in real time. And then this is what I want to kind of talk to everyone about and show everyone. Listening as much as you can. We introduce VASA, a framework for generating lifelike talking faces with appealing There we go right there. So now we are seeing something that makes this, again, both awesome and frightening. So what the user, presumably a Microsoft researcher, because according to Microsoft, that's all that really has access to it right now. In real time, they are generating new scripts swiping between the different faces, right? So again, they're going from as an example, male, female, young, old, same voice in real time,
Starting point is 00:23:35 but also they are dragging and dropping on the face that is speaking. And then what they're doing is they're adjusting kind of the angle of the face. And I cannot explain how much power, how much computing power this would generally take. And this is, again, happening in real time, no buffer. and then they're going to be moving this face around. I'm going to play this for just a couple more seconds. Visual effective skills given a single static image and a speech audio clip. So I'll tell you this.
Starting point is 00:24:09 As someone in a previous life that did a lot of videography, that did a lot of, I put together a lot of stories. And so many times you would have to have maybe two or three cameras because you want to catch these different angles, right? Wow, all of a sudden, you don't need that anymore. You have infinite angles, infinite people, and they can say whatever you want. And it looks pretty realistic.
Starting point is 00:24:35 So we're going to get to why that's a problem, but also extremely powerful. Right. All right. So I told everyone that I'd give you seven things that you need to know. Great question here from Tanya. So Tanya, thanks for the question, saying if it's something we don't. want to listen to, does it matter if it's real or fake? That's a great question.
Starting point is 00:25:05 And also, it's like, okay, when or if the world gets access to this, all right? But I don't think that that's going to actually be an issue because regardless of what Microsoft does, I think that we're going to be seeing this technology, whether we want to the public to have it or not. Cecilia, with a great, a great observation here saying the facial expressions and the last who were scary real. Yes, they were. All right.
Starting point is 00:25:33 So let's go ahead. Let's go over the seven things that you need to know about VASA 1. Okay, so again, a quick recap of what it is and what it does. So it is deep fake. That's what it is, right? It is deep fake real time. So that's another thing that they talk about in the paper, how little processing power it actually requires, right?
Starting point is 00:25:56 You don't need a third. $30,000 GPU. It's a commercial of what Microsoft is saying. It's a commercial grade graphics card. So it is near real time. So what that means is, in theory, you might be talking to this person on a Zoom sales call. And it looks extremely realistic, right? It only needs one photo and a source of audio.
Starting point is 00:26:20 You have these kind of post-processing controls. It is very realistic head movements, controllable motion. And again, the human-like tendencies that this captures is amazing. You know, right now it's only, I believe, 512 by 512 pixels. So it's not, you know, HD quality or 2K or 4K quality yet. But I think that's probably where it's heading. Another thing that you need to know about this, it's, I can almost guarantee that this VASA model, whether they're actually on two now or on a next.
Starting point is 00:26:58 consideration, right? It takes time to get your paper approved. It takes time, you know, to kind of go up the corporate ladder and say, are we going to release this? Is the world ready for this? So in theory, this technology could be six months old. It could be 18 months old. We don't know, right? They could already be on VASA 2.5 that does 4K, even faster. We don't know. All we know is that this is not today's technology. It has presumably already been greatly improved. All right. So that's number one. There's just a quick overview of what it is.
Starting point is 00:27:32 Number two, BASA one, is not released yet. All right. So Microsoft did say that they do not want to release this to the public until they feel it is safe to use. But hey, here's, hey, hot take Tuesday coming out. When would this be safe? That's an honest question. I don't know, live streamed audience.
Starting point is 00:27:54 What do you think? When would this ever be safe? Again, I understand. the upsides. I understand the overwhelmingly positive impact this could have on society. But is there a scenario where this is actually safe to put this especially in the hands of the public? Again, I don't know. Are they going to have certain safeguards? You know, as an example, is that maybe you can't upload your own photo. Maybe you can only use their preset, you know, photos or, you know, AI generated. I don't know, right? There's so many unknowns.
Starting point is 00:28:31 there's not a lot of knowns. The research paper is not very long. And again, this isn't available, right? So if this ever does become available, it could even, it could, in theory, be robustly even more powerful or it could be stripped down from, you know, what we just saw. So we don't know. But it is not released yet. But they're also not the only one, right?
Starting point is 00:28:54 So China's Baidu, Baidu, gosh, I always forget, guys. I say so many names that I forget pronunciations, but China's Badu also had a similar model called Emo, EMO. I don't think it was nearly as impressive. It was about six months ago, so they probably have, or maybe not six, maybe it was about three months ago. But they already have, I'm sure they already have a newer version. But regardless, even though this is not released yet, and Microsoft said that, you know,
Starting point is 00:29:24 they don't, they didn't necessarily detail plans on when or if it would be released. they said that they need to ensure that this is safe. I don't see a scenario where this is safe. Sorry. I don't. There's, I think there's way more potential for misuse than there is for positive use cases. Again, I think this is a very cool solution looking for a problem. But I don't think it matters, right?
Starting point is 00:29:52 I don't think, you know, it's not like Microsoft's is the decision maker on this. Like we said, there's already been a pretty impressive. variation of this with China's Batu, their emo model. So I think whether Microsoft releases this to the public or not, there is going to be another company, whether it's Google DeepMind, whether it's someone we've never heard of creating this. All right, number three thing you need to know, similar technology is already public, right? And again, this is where we kind of have to differentiate between what is a deep fake versus what's a digital twin. So there's already great platforms.
Starting point is 00:30:30 So as an example, Synthesia, Hey, Jen, Hour 1, D-I-D, etc, where you can already create a digital twin. For the most part, there's two sides of this. So one is, yes, you can create one of yourself, right? You can, you know, green screen set up, you know, record yourself, different angles, all that. But that is a much more detailed process, right? I'm actually have a secret project that I'm working on that I'll probably tell you guys about soon. I love that part of the technology, right? Where if you wanted to, right, so let's just say you're head of HR, your head of learning and
Starting point is 00:31:08 development and you can only do so much, right? Okay, well, what if for controlled purposes and, you know, generally you have to sign a lot of documents and, you know, there's a process to go through and creating a digital avatar. But I like that aspect of it, right? Like imagine if, yeah, if you're in charge of learning. learning and development at a huge company and you don't have enough time to train people. Maybe you want to train people on AI, but you don't have the time. Okay, there's a cool use case.
Starting point is 00:31:34 So this technology is already, similar technology is already available in these kind of digital twin or AI avatar spaces, Synthesia, hey Jen, hour one, DID, there's a couple others, right? But it is not nearly as powerful. And maybe these companies made it that way, right? Maybe they said, we don't want it to be, you know, you can go choose literally any photo, could say anything, although you have a little bit of that, but it doesn't look as realistic. So this new Microsoft research paper, VASA 1, and the examples are, they look so human. It is scary.
Starting point is 00:32:10 As someone that's watched, you know, uncountable amount of, you know, these new products, products with these digital avatars, you know, for the most part, you can tell that they're digital. You can tell that they're AI, which I think is actually a good thing, when it doesn't look too realistic because then you start to, yeah, blur this line between what's real and what's fake. All right. Number four, VASA one's quality is outstanding. So like I said, the realism is uncanny.
Starting point is 00:32:39 And the control for deep fakes right now is something that is not publicly available. So even as an example, the different, you know, zooming in and zooming out of, you know, where you want the head, the eyes looking in different directions, you know, being able to click on something in real time. and, you know, essentially generate all these different angles of someone talking live. The quality and the realism is outstanding. Again, I'm sure that there's already a 1.5 or a B2 that researchers are already building. That's way better. That's HD.
Starting point is 00:33:11 That's even faster. That requires even less compute. All right. Thing number five, you need to know. There are so many positive use cases for VASA 1. Yes, we talked about the negatives, but training. personalized learning and development. Helping to break down communication barriers.
Starting point is 00:33:30 That's a great one, right? You know, people that have certain, you know, learning disabilities or cognitive impairments, right? This technology could be extremely helpful. Maybe there's people out there that, you know, don't learn well or can't understand things without, you know, really looking at a person, speak it to them, right? Or maybe they just learn better.
Starting point is 00:33:53 So I think that there's so many. positive use cases that can change the world in a good way can help society. Right. So obviously, I can't overlook that. But number six, is it so good. It's bad. Sorry. This technology is so, so good.
Starting point is 00:34:15 I just think it's bad, you know? And I am the type of person that loves deploying AI out in the wild, right? If I had a client, right, and they said, hey, Jordan, we want to start, you know, doing all this with, with this VASA model. Again, it's not, it's not openly available. It's not public. I would say, are you sure? Are you sure you want to? Right.
Starting point is 00:34:38 Again, maybe, maybe, you know, I think a lot of times I think that spending time doing this every single day that, you know, I'm personally, quote unquote, ahead of the AI curve. And maybe this is just, you know, old school in me, you know, old man Wilson shaking his shaking his, shaking his, shaking his, fist on the porch saying, ah, this new technology, right? I don't know. I don't know about this one, right? It gets to, especially when we talk about misinformation, disinformation, right? I said yesterday, but, you know, imagine this before the U.S. election. Imagine if you could, right?
Starting point is 00:35:16 And again, the technology's there, but it's not good anywhere else and it takes a long time. Imagine if this was available now. Imagine if someone could take your photo and make it. you say anything. Again, we don't know when and if this is released, what the capabilities are. I'm not saying that that is going to be a feature or an option in this model or others, but that's where this technology is heading. And whether, you know, I guess it's luckily a company like Microsoft who I think is doing things in a pretty ethical and responsible manner, but I can guarantee you that this technology very soon will be replicated by bad actors,
Starting point is 00:35:56 right? Big tech companies, right? It could be other countries using this for disinformation, and maybe we just don't know. So I do think it's important to talk about this, to have a conversation. And yes, as excited as we could be and say, oh, I want to use this for, you know, this project and this project and say, okay, wait, do we really need that? because I do think at least early on, the potential for it being bad far outweighs the potential for it doing more good.
Starting point is 00:36:28 Just because I think the overwhelming majority of people, maybe not you, right? Like if you're tuning in here every single day, right? If you are someone that's using generative AI every single day, I think you probably have a little bit better of an understanding of how this could be properly used in society. and how maybe the good could outweigh the bad. But you are in the 1%. You're in the 0.1%, right? If you're tuning in here every day,
Starting point is 00:36:54 if you're pushing generative AI use at your company, right? You are still in the minority. You know, because 99.9% of people that would see something like this would not know. And that just poses so many implications. And then last but not least, I think if nothing else, VASA 1 is going to prepare us for a new normal.
Starting point is 00:37:17 whether we want this or not, it's coming. You know, I know that's, that's, that's weird. It's weird to say whether you like it or not, this is coming again. Maybe not Microsoft, maybe not, you know, these other companies that I named. But there's going to be maybe unnamed companies. This is going to happen. We are going to see deepfakes that are so incredible. realistic of people like you and me, whether we give them our consent or not, this is where
Starting point is 00:37:55 the technology is heading. No, I'm not a doomsday AI prepper. You know, I'm not being, you know, freaking out here on the everyday AI show. I'm being realistic. We technically, the average person does not get a say in where this technology heads. And the capabilities and the power of of what we've seen in this VASA model, I think will unfortunately inspire bad actors. So we have to understand that whether we want this technology to exist or not, it probably is going to, right? And I think that's an important conversation to have.
Starting point is 00:38:37 So let me know. What are your thoughts? Is this good? Is it bad? Again, we don't know if we're going to see VASA 1 in the public. But what are your thoughts? And a quick recap, right? Here's the seven things you need to know.
Starting point is 00:38:55 So what VASA 1 is? It is a deep fake technology. One photo, one video, tons of control. Number two, it is not released yet. Microsoft said it wants it to be safe. We don't actually know if they'll even release it. Number three, similar technology is already public. So we have the digital twins and AI avatars from companies like Synthesia, hey gen,
Starting point is 00:39:15 Hour 1, D, D, etc. But then we also have very similar technology like China's bad. do from the emo model from Bedou. Number four, the quality is outstanding. You cannot tell. I mean, you literally cannot tell by looking at it. I'm staring at it on a, on a HD screen. You can't tell.
Starting point is 00:39:35 It is uncanny. Number five, there's so many positive use cases for this technology, training, learning, and development, helping breaking down communication barriers. Number six, I think it's so good, it's bad. The potential for misinformation and disinformation, political campaigns, you know, personally, I think that there's maybe more bad than good. And number seven, I think this prepares us for a new normal. Again, maybe Vasa 1 is just putting this all on our radars.
Starting point is 00:40:01 Maybe this will never become public. Maybe this is where it stops. But this is not where this story ends of AIT fakes. All right, y'all. I hope this was helpful. Like what Moses says, it's good, but all things can eventually get used for evil. I agree. This is important to have a conversation.
Starting point is 00:40:26 So, hey, if you're listening to on the podcast, make sure to check out your show notes. Let's keep the conversation going. If you're joining on LinkedIn, YouTube, wherever, I think there's a ton of very smart people listening in, talk with each other. I think this is an important conversation to have. So thank you for tuning in. Make sure to join us tomorrow. We're going to talk about generative AI, how you can turn trash into treasures with James Daniel, the VP of AI from Lanzat Tech, as well as Thursday. Oh my gosh. Can you guys believe we've been doing this for a year? We're still here.
Starting point is 00:40:58 We're still around. Thank you for that. So we are celebrating our one year anniversary of the Everyday AI show. By doing this, we are going to go back and redo our very first show. Our very first show, pose the question. Will AI take our jobs? So one year later, we're going to ask that same question. I think it's just like today's conversation, I think it's an important And we don't like to talk about that. We don't like to talk about the downside of AI. We like to talk about, oh, I use, you know, Chad GPT to do this. And, you know, I gain back all this time.
Starting point is 00:41:30 That's great. But we have to talk about the hard questions. That's what we do on everyday AI. Thanks for tuning in. If this was helpful, let me know. Please consider leaving us a rating, sharing this. You know, if you're listening on social media, hit that repost. It takes us sometimes 10, 15 hours to put together one single episode.
Starting point is 00:41:50 It takes you 10 to 15 seconds to repost this, share this with your network, tag your friends. Thank you for tuning in. Go to your everyday AI.com for more. Hope to see you back tomorrow and every day for more everyday AI. Thanks y'all. Meet Firefly AI Assistant. Now live in Adobe Firefly, the Allman One Creative AI Studio. Just describe what you want to create in your own words and the assistant handles the rest,
Starting point is 00:42:21 orchestrating multi-step workflows across Adobe Creative Cloud apps, including Photoshop, Premiere Express, and more in one conversational interface. You direct the outcome while the assistant accelerates execution. Stand control with the ability to step in and refine at any time. See it today at firefly.adobie.com. And that's a wrap for today's edition of Everyday AI. Thanks for joining us. If you enjoyed this episode, please subscribe and leave us a rating.
Starting point is 00:42:54 It helps keep us going. For a little more AI magic, visit Your EverydayAI.com. and sign up to our daily newsletter so you don't get left behind. Go break some barriers and we'll see you next time.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.