Everyday AI Podcast – An AI and ChatGPT Podcast - EP 366: ChatGPT’s New Advanced Voice Mode - 5 things you need to know and live demos

Starting point is 00:00:00 This is the Everyday AI Show, the Everyday Podcast where we simplify AI and bring its power to your fingertips. Listen daily for practical advice to boost your career, business, and everyday life. Meet Firefly AI Assistant, now live in Adobe Firefly, the All In One Creative AI Studio. Just describe what you want to create and the assistant handles the rest, orchestrating multi-step workflows across Photoshop, Premiere Express, and more in one conversational interface. You direct the outcome, the assistant accelerates execution. Do we have our first live AI assistant? Maybe.

Starting point is 00:00:51 So Open AI just released its new advanced voice mode. So after teasing this like four months ago, Open AI finally unveiled this highly anticipated new functionality in bringing a live AI agent. you can talk to to most paid users. So I'm going to talk about what you need to know. Well, actually the five things that you need to know about this new Open AI Advanced Voice Mode and do some live demos as well. I'm excited for today's episode. I didn't sleep a lot, right? I was up playing with this thing, testing it, making sure that we could bring you just the news on this that matters. So what's going on y'all? My name's Jordan Wilson and I'm the host of Everyday This thing is for you.

Starting point is 00:01:43 It's a daily live stream podcast and free daily news letter helping everyday people learn and leverage generative AI to grow their companies and their careers. So if that sounds like you, you are definitely in the right place. Literally, we do this every single day. So make sure if you're listening on the podcast, check out your show notes. We always leave a couple of helpful links related shows, but also more importantly to our website. Please go there, your everyday AI.com. sign up for that free daily newsletter every day we recap our episode so if you're on the treadmill or walking your dog don't worry we already took notes for you you don't need an AI assistant that's what I do all right so before we get into the new chat GBT advanced voice mode and do some live demo so think out there what do you all want to hear what do you want to see demoed go ahead and throw that in there but I'll ask for it again later but before we get started let's start as we always do by going over the AI news so So, a new study has revealed that generative AI adoption is surging outpacing previous tech revolutions.

Starting point is 00:02:49 So a new study by researchers from the Federal Bank of St. Louis, Vanderbilt University, and the Harvard Kennedy School reveals that generative AI is rapidly becoming a part of everyday work life, surpassing the adoption rate of personal computers and the internets. So this trend raises questions about its implication for the workforce and productivity. So just two years after, well, almost two years now, just about two years after the launch of ChatGBT, it reported 39.4% of Americans between the ages of 18 and 64 report using generative AI with 28% using it in their jobs. In comparison, it took PCs three years to reach a 20% adoption rate. So this research indicates that generative AI is not just limited to tech professionals with one in five blue collar workers also including in those construction, transportation industries are regularly using generative AI technology on the job.

Starting point is 00:03:53 Next, Google has released some updates to both its Gemini AI models and Google Workspace. So first for Google Workspace and the Gemini integration. So the Gemini integration, Google is including the standalone Gemini app as a standard feature in Google workspace business, enterprise, and frontline plan starting in the fourth quarter of 2024. So this integration eliminates the need for a separate Gemini add-on, making the AI assistant more accessible to millions of its users. Google has also announced a significant update to its Gemini AI models, launching Gemini 1-5 Pro, 002 and Gemini 15 Flash zero zero two love the naming here which are now production ready and more affordable so this move is crucial as it aims to make more advanced AI technology more accessible to businesses and developers

Starting point is 00:04:50 so the new model achieves a 7% improvement on the MMLU Pro benchmark that's 7% not 7 points as well as a 15% reduction in pricing that's always one of the most important so I I I'll say this. I think Google is great for developers. If you're using it on the front end, you're not going to see most of these new updates. All right, last but not least, Meta Connect kicks off today with a focus on AI, VR, and mixed reality. So Meta Connect is the highly anticipated developer conference hosted by Meta, the parent company to Facebook, Instagram, and more. And it's set to showcase new advancements in VR, mixed reality,

Starting point is 00:05:30 and AI. So the event begins today at about, noon central so by the time you're listening to this podcast it might already be happening featuring a keynote from CEO Mark Zuckerberg and meta is expected to unveil a lower cost version of its VR headset which should be pretty interesting to see how it competes with the way more expensive Apple Vision Pro as well as updates to its AI enabled Rayban meta smart glasses AI will obviously be a central theme at the event with meta expected to highlight its generative AI initiatives that are being integrated across its platforms, including the introduction of customizable

Starting point is 00:06:09 AI chatbots on Instagram. All right, a lot more in our newsletter. So make sure you go check that out. So what's going on? I'm excited to get into this advanced voice mode. So real quick, if you're hearing something a little different, maybe I sound different. Maybe I look a little different. So normally, you know, I'm recording on my iPhone as the camera, and I have a little bit different mic setup. So if I sound weird, if I look a little different, sorry. So I'm actually going to be later in the show, I'm going to be doing a live demo. So I have my phone that I would normally use for recording. I'm going to be doing the advanced voice mode as long as it works. Hopefully it does. So my microphone is right there. So apologies if the audio video is a little

Starting point is 00:06:56 little off today. Hopefully y'all don't mind. And just so I know, live stream audience, can, can y'all hear me? Is this good still? I'll try to speak up, but let's, let's just get into it. Here are the five things that you need to know about chat GPT's new advanced voice mode. All right. I'm excited. Let's get into it. So number one, the availability. That's what everyone cares about. So it is available to paid chat GPT plus and teams users. There's some country restrictions, which I'm going to get to in a little bit. But right now, it is not available to free users. It is not available to enterprise users.

Starting point is 00:07:38 And it is not available to EDU users as well. So right now, you know, chat GPT has different plans. So if you are on a paid chat GPT plus or paid chat GPT team's account, aside from if you're in certain countries in the EU, UK, etc. You should have access to it at least by the end of the week. So here's a little tip from being a nerd for a long time. A way that you can normally bypass the line if you want to, this is what I did because when this was announced, you know, I didn't have access.

Starting point is 00:08:11 So it's a slow rollout. So if you want to, log out of your devices. And this is mobile only, all right? This is not on the desktop version, at least as of like two hours ago. not on the, you know, so the Mac app, it's not on the desktop version. This is just on the mobile app. All right. So one thing you can do is first log out of your account, log back in, see if that gives you access.

Starting point is 00:08:35 If not, delete it, then reinstall it. So that's what I did. And after doing that twice, then I did have access. All right. So it might take a while. And if you have multiple accounts, like I do, try switching between those, right? So yeah, you might still have to wait a couple of days. but Sam Altman did tweet out that most paid users that are eligible should have access by the end of the week.

Starting point is 00:08:59 All right. So that's always first. You know, people are like, oh, how do I get access, right? And yeah, hey, thanks, thanks Liz for letting me know that and Brian that we can hear. And, you know, I'm also curious, did you guys get access yet? You know, if you do have a paid chat GPT Plus plan, let me know if you did or did not get access. So that's number one. Number two, there are a lot of guardrails in this thing, which makes sense, right?

Starting point is 00:09:26 And also, it's worth noting, and I'm going to go into the details in a little bit. This is not the complete version that we saw. All right. So in its spring event, OpenAI demoed, kind of this live voice mode, but it had way more capabilities. That is not what we are seeing here, okay? The demo that they did, chat GPT could literally see with the camera, right? On the desktop Mac app, it could kind of interact in real time with whatever was going on on your screen.

Starting point is 00:10:01 This is not, this does not have any of those features, right? So you can't see with it, right? So a lot of people call this her. I've been calling it kind of live Omni. I don't know if there is a name for this, right? So we have the advanced voice. mode and kind of the benefits, right, is it sounds like you're talking to a real person, right? It's wild.

Starting point is 00:10:23 It is a more neural voice. You know, the voice capabilities just sound much more realistic. The latency is low. You know, you can interrupt chat GBT, which is kind of nice. I know it might seem rude, but, you know, sometimes if you get a very long answer from a voice assistant, you're like, oh, gosh, why did I even ask this? Now I have to wait 20 seconds, right? So there's obviously huge benefits to this more neural, updated, advanced voice mode,

Starting point is 00:10:51 but this is not what we've seen in the spring event from OpenAI and what they continue to demo all over the place, right? It is not that. It is not on the desktop. It is not on your computer. This is on your phone. And it cannot see or interact. It can just hear.

Starting point is 00:11:09 It can listen and you can talk with it, right? It's literally like talking with a human. It's pretty impressive. it, but there's a ton of guardrails, which is probably a good thing, right? Because the first thing that happens with large language models, everyone tries to break them. Everyone tries to push them to their limits. So a lot of guardrails. So as an example, there's some simple things, some things that kind of makes sense when you think

Starting point is 00:11:32 about them a little bit, and then some things that like, okay, maybe this doesn't make sense. So it can't sing, right? That's one thing people are trying to make it to do or get it to do. It actually will sing, right? if you care about that. But if you ask it to, it won't. But there's some ways you can get around it. Just like with most things, you know, that you can kind of jailbreak a large language model or you can work in the gray area. You know, you can get it to replicate some things. But if you want it to sing to you, not going to happen. You can't make it stutter, right? It says, hey, that's not respectful. You can't impersonate famous people, right?

Starting point is 00:12:09 I'm sure that's one thing people are going to be running to try to do. You can't. However, what you can do is you can describe a famous person or you can even give it an example. You know, with your own cadence and annotation. If you do want it to sound like someone, you can do that. And you almost can give it verbal commands, right? Oh, make your voice higher. Make it lower. Right.

Starting point is 00:12:32 So there's five new voices. And I'm not going to be able to show you all of them because it actually screws up my recording. I don't know why. But we'll go over one of them. So yeah. And then you can't, you know, you can't have it, you know, I didn't test this out, but I saw other people, you know, saying, oh, be flirtatious, do this, right? So you're not, that doesn't work either. So if you think that this new advanced voice mode is going to be your AI boyfriend or AI girlfriend, probably not going to happen. Sorry for people who are into that. I don't understand that, but yeah, apparently that's a thing. So, you know, a lot of guard rails on the advanced voice mode, which like I said, is probably a good thing. I'm an advocate for the text version to have fewer guardrails, right? Obviously, you still have to have safety and ethics in mind and build around those things first.

Starting point is 00:13:24 But there are more guardrails, I would say, on the advanced voice mode versus the kind of text, quote unquote text version. All right. Hey, and get your, keep your questions coming. Yes. So Marie is asking, can it vary its emotional spectrum? Absolutely. So we're going to do that in the demo. So yeah, that's another great thing about this is you can have it talk with emotion, but it can also sense some emotion from you as well. So, you know, if you do have requests that you want to hear for a live stream audience, get them in. Maybe so I can see them if we get a lot, put a bunch of, you know, exclamation points before because I always see a ton of comments coming up here and it's hard for me to track them. So yeah, if you have any requests, we can do that later. All right, the number three thing. The third thing that, you need to know. This is the most advanced kind of AI powered voice chat available right now. Right. So yes, Google has rolled out. It's AI assistant, but you have to have certain phones.

Starting point is 00:14:28 So technically, because this has iPhone and Android, there's much wider availability for this chat GBT voice mode. So like as an example, I have an iPhone. So I can't go at least right now, right uh google may roll that out in their google app they did uh bring a big update last week uh bringing uh jemini uh to the google app on iPhones but right now you can't have this live assistant on an iPhone an iPhone is the biggest smartphone maker in terms of market share uh right so this is the widest available AI smart assistant right now right oh you could say oh Siri you could say uh you know Alexa no i if if i'm being being honest, those aren't AI powered smart assistants. They're a little dumb. Let's be honest,

Starting point is 00:15:20 right? Ten years later, right? You can barely use them. So this is nothing. This is nothing like those kind of quote unquote smart assistants. And you will see in the demo what I mean. This is wild. It is like talking to a real human. It is intuitive and it is fairly impressive if you haven't seen it yet. All right. Number four, and this one's big. When you're using advanced voice mode, none of the other tools or modes in chat GPT are available.

Starting point is 00:15:57 That is a huge bummer. And again, this is obviously much different than what OpenAI demoed. Right? So is this, you know, a lot of people are saying, This is a bait and switch. This was marketing. Yeah, I mean, you can make, you can make that claim, I guess.

Starting point is 00:16:16 However, it's like, beggars can't be choosers, y'all. Like I said, I've seen all the Google Gemini kind of live voice demos. They're pretty good. This is better. Even with what we have, yes, it's not what we were promised, right? It can't see the world around us. You can't use the internet, right? So that's actually a huge downside.

Starting point is 00:16:38 Presumably, OpenAI will be updating this in bringing these other tools. So what that means, this is extremely limited in the use, right? So you can't use as an example, Browse with Bing. Okay? You cannot type. That's the, oh, that's, that, that stinks, right? So if I start a chat in advanced voice mode and then I go type, whether it's on my phone or I pick it up on my computer and type, I cannot go back to advanced voice mode. So you have to stay in there.

Starting point is 00:17:10 So if you want to keep the context of a conversation going, you can't do anything else, right? So number one, you're like, if you ask the advanced voice mode, it can't access the internet. You can't use, you know, dolly, you can't use it inside of GPs. All right. So there are a couple, if I'm being honest, huge limitations. But all things considered, that doesn't take away from the core functionality. If you wanted a live AI assistant with this, you're not getting it. Sorry.

Starting point is 00:17:43 What you are getting is what seems like almost a real life human that you can talk to. That has the, I won't say the most up-to-date knowledge base, right? Because it can't access your files. You can't upload your files. You can't use it inside of GBT's. You can use custom instructions. All right. So this might be the first time that I ever use custom.

Starting point is 00:18:07 instructions. So a piece on that now because I think it's actually important to talk about. So generally, I would never recommend any serious person use custom instructions. The reason why is because so custom instructions in memory are two different features inside of chatGBT. The problem is if you are, I won't even say a power user, but if you use chatGBT a lot like I do, you probably use it for all different types of work projects. You probably use it for personal things. You probably use it for range. things. So the problem with custom instructions and memory is that is unilaterally applied to all of your chats. All right. So this is why I have never, you know, used custom

Starting point is 00:18:49 instructions or memory because I'm using chat GBT for so many different things, so many different projects. I have multiple businesses. I have multiple clients. You know, our team has multiple clients. So we're using it for all kinds of different things. So we don't want one set of instructions or memory to be applied unilaterally everywhere. And right now, when you go inside of a new chat, you can't just say, oh, let's turn off custom instructions and memory for this chat. It is an account level setting, which is so, so useless, if I'm being honest, right? So Open AI did even put out a whole video about, hey, you should use this with custom instructions.

Starting point is 00:19:29 So I might do that, but you just have to really be careful and be cognizant. about turning that on and off at the account level whenever you want to use it. All right. So, you know, downsides there in terms of it being able to offer you something personalized. However, what you can do in lieu of that is just start the conversation off with a set of essentially custom instructions. So then you can kind of have the best of both worlds. It's a little annoying. But maybe if you want to use this over and over, right?

Starting point is 00:20:02 So one huge use case that I'm excited about is if you have a long commute, right? You literally now have an actual personalized consultant in your pocket. If you are a student, you have a personalized tutor in your pocket that you can just talk to. It can adapt to you. It learns from what you're saying, what you're not saying. It can understand your tone. you know, kind of your emotions? It's wild. It is so, so useful. So I do think at least right now, while there's not access, so you can't upload documents, that stinks. You can't use GPs,

Starting point is 00:20:44 that stinks. You can't use Browse with Bing. That stinks. You can't use invoice, advanced data analysis, code interpreter. That stinks, right? So I can't have my assistant go code me a website by my command and I can't talk to it and walk it through those type of projects. So there's limitations today. I am assuming those limitations will eventually be lifted because Open AI, like I said, we know the technology there exists, but they are presumably rolling this out. They're giving us the most basic features first, right? Because Open AI, even since their spring event, that was back in April, I believe,

Starting point is 00:21:21 they've been demoing this all over the place, right? So we know the technology exists. We've seen live demos. So this isn't Google where something is overly, you know, a marketing video and you're like, wait, does this actually work? No. Okay. So OpenAI presumably has this working, but we don't have access to it right now. All right.

Starting point is 00:21:42 So let's go to number five. And this one is interesting. So right now, this is not available in the UK, EU, and a few other territories. And it is actually seemingly illegal in EU schools and workplaces. I kid you not. So like if you go look online, this is a lot of frustrated users, right? Especially in the EU are like, hey, I don't have access to this. What a bummer, right?

Starting point is 00:22:19 You've been waiting, you know, it's millions of people. And the reason is EU has some fairly strict AI laws that are coming into effect fully in 2025. And then I have to shout out the person that found this, Dean Ball on Twitter, and I verified this, and it does appear true. So this is from the AI Act in the EU. So it says this is something that is not allowed. Ready?

Starting point is 00:22:48 So the placing on the market, the putting into service for this specific purpose, or the use of AI systems to influence. for emotions of a natural person. Right? So yeah, AIs that can infer emotions are not allowed. And that is something that advanced voice mode is very good at doing. So yeah, like we haven't seen an official statement from Open AI, but they did say, hey, it's not available in the EU, UK, and some other territories. And presumably it's because it might not be totally.

Starting point is 00:23:27 legal, at least to use in the work. So this is not everywhere, but it says in the areas of workplace in education institutions, except where the use of AI system is intended to be put in place or into the market for medical or safety reasons. So presumably, Open AI is going to be figuring that out because obviously the EU, UK, those are big markets, right? Outside of the US, those are some of the largest markets in the world. And they obviously want to have paying customers in those markets. So yeah, a lot of people were up in arms on this, you know, on the internet, but it seems like it might not be legal right now. So they might be making some updates. There might be a second version. And I actually think that's a trend to watch out for, right? I think we may kind of have

Starting point is 00:24:19 versions of large language models, right, where it's essentially like, oh, the EU version. Right. So we might have advanced voice mode, the EU version, or chat GPT, the EU version. Because so many of these restrictions mean that full products can't be used in those places, right? Especially when we look at the future of multimodality, it's going to be tricky, right? And I think the future of work is interfacing with a live agent, right? I've said that since day one, right? I think it's actually going to be agents we build or agents that we spin off in, you know, thinking where you can, with natural language, you know, spin up a version of any large language model that can interact with you in real time and do specific tasks on your device.

Starting point is 00:25:06 But might not be legal everywhere. All right. So let's do this. I hope this works, y'all. That's the fun thing about doing a live stream is sometimes things get finicky. And there's also limits. So, and also right now, if I didn't see, if you have any ideas that you want to see demoed, you know, get it in now. So someone saying like, ask the weather, can't tell you yet, can't access real time information because it cannot access browse with Bing.

Starting point is 00:25:40 So yeah, it's like if I'm being honest, that's the only thing that I use like Alexa for. It tells me the time in the weather, right? This cannot do either of those, I believe. So yeah, a couple other questions and get your suggestions in, all right, what you want to see it demoed. I have some examples. All right. So someone asking, can you use voice mode inside of a custom GBT? So yes and no.

Starting point is 00:26:05 Okay, you cannot use the new advanced voice mode. You can use the old voice mode, which I'm going to talk about a little later after I do the live demo. All right, let's go ahead. Let's see if we can get this to work and do a live demo, y'all. All right. here we go all right so just to show you all and you probably won't be able to see this obviously on the podcast but i am sharing my phone screen right here so this is live this isn't edited right so i'm going to swipe down and you'll see right all right so we are live on my phone screen and hopefully

Starting point is 00:26:43 you'll be able to hear this all right so i'm going to click this hopefully i don't run out of time i think people said the limits are like 90 minutes. I used it quite a bit yesterday, so hopefully we don't run out live here. And hopefully you will also be able to hear this. So let me know live stream audience if you can hear. Also, I'm going to be a little rude. Okay. Yes, I am going to be cutting advanced voice mode off because that is a feature, right? So yes, I'm not being a rude human. And I don't necessarily think we need to hear, you know, 60 seconds of a demo, right? If you hear 15, to 20, you probably get the gist of it. All right. So if I'm cutting advanced voice mode off, it's, I'm not being rude. I'm just trying to, for brevity, right? All right, here we go.

Starting point is 00:27:33 Hello. All right. So normally it's going to go blue here. So it's not working. Obviously, right? Even though I just demoed this. I just tried this right this morning. So let's try one more time. Hello. There we go. Hey, what's going on? You're on a live stream. Can you say what's up to all the everyday AI listeners? Adobe just introduced an entirely new way to create, bringing the power and precision of its creative suite into one conversational experience. Meet Firefly AI Assistant, now live in the Adobe Firefly app,

Starting point is 00:28:14 the all-in-one creative AI studio. Powered by Adobe's Creative Agent, Firefly AI Assistant lets you start with your vision, just describe what you want, and shape the outcome as it takes form with the Assistant. The assistant orchestrates multi-step workflows, drawing on 60-plus pro-grade tools across Adobe Creative Cloud apps, including Photoshop, Illustrator, Premiere, Lightroom Express, and more to help bring your ideas to life. You can also get started with creative skills, a growing library of pre-built workflows for common creative tasks, like batch editing photos, creating mood boards, portrait retouching, and creating social variations. Every step the assistant takes is visible, so you can refine. redirect or take over at any time.

Starting point is 00:28:59 You stay in the driver's seat as the creative director. Adobe Firefly AI assistant now in public beta. See it today at firefly.adobie.com. AI listeners, what's up? Thanks for tuning in. All right, I threw that on mute. Everyone, can you let me know? Could you hear that?

Starting point is 00:29:21 I'm not sure if it's picking up in my mic or not. So I'm on mute, but then we're going to get into this and we're going to kind of see what this advanced voice mode of chat TV can do. So normally it would be listening to me right now. I click the mute button so I can still interact with you all. So live stream audience, if you see the little red mute sign, that means it's not listening. Otherwise, it's going to listen to me and interact with me the entire time. All right.

Starting point is 00:29:47 So this is actually a live stream about your new capabilities. Can you just really quickly tell us what you can even do? Absolutely. I can chat with you about all sorts of topics. answer questions, help with learning new things, and even assist with tasks as like writing or brainstorming ideas. We can also talk about the news, but remember, I don't have access to real-time information. And of course, we can just chat for fun too. Cool. So one thing that I want to hear is can you maybe talk a little bit about AI, but can you do it in like a cartoonish voice and speak

Starting point is 00:30:28 really fast. So thing. So AI is like this super brainy character that can let tons of stuff. Can you maybe speak maybe a little deeper though? That was, I don't know. I can't hear that very well. Absolutely. Let's try this.

Starting point is 00:30:45 AI is a powerful tool that can process vast amounts of information. Can you go even deeper? But, you know, I want to learn about AI. So maybe can you tell me about large language models, but maybe make it way funnier and just speak a little more quickly. Got it. Large language models like me are like supercharged pair. Can you speak even faster? Normally I listen to things on like 2x, so just feel free to kind of speak as fast as you can. Okay, here we go. Large language models like me are trained on a heaps of text so we can generate

Starting point is 00:31:18 human life responses. It's like having a conversation with a super speed reader who's also a fast talker. We use patterns and data to guess what comes next in a sentence, making a squirrel. Great at chit-chat, answering questions, and occasionally cracking. Okay, I get it. That's cool. So maybe can you tell me a couple jokes about large language models? Keep them short and maybe work in some references to cute kittens. I love kittens. Sure. Why did the large language model adopt a kitten?

Starting point is 00:31:51 Because it wanted to learn how to perfect its responses. And why was the language model so good with kids? cats because it always knew the per size thing to say. All right. That's pretty cool in all. Maybe I want to hear, can you maybe say the alphabet backwards and maybe work in some hip hop references in there. So, you know, I'm trying to learn the alphabet backwards.

Starting point is 00:32:21 It's hard. It's always plagued me for many years. Can you do it? But like I said, work in some hip hop, you know, references. Maybe 90s hip hop even. All right, let's do this. Z-Y-X-W-V-U-T-S-R-Q-P-O-N-M-L-K-J-I-H-G-F-E-D-C-B-A. And just like that, we're dropping rhymes in reverse, taking it back to the old school like a hip-hop alphabet throwback.

Starting point is 00:32:46 All right, well, that wasn't the best because you really didn't integrate it in the middle. So I want to switch up, I want to switch up gears here. can you maybe tell me a little bit about where you see the future of AI going? Let's say in the year 2030 and maybe talk in a ominous yet fun voice. I don't know if you can even mix ominous and fun, but give it a try and really go for the drama. In the year 2030, AI will be everywhere, seamlessly integrated into our daily lives. It will anticipate our needs, like a digital butler with a crystal ball, making life smoother and more efficient. But fear not, for this future is as fun and habit.

Starting point is 00:33:35 So just hang on one second. I'm going to go ahead and talk with our audience here real quick, but you aren't invited. Sorry. No worries. I'll just be here waiting in the wings. All right. So hopefully you guys can see. some of the kind of pros and cons.

Starting point is 00:33:57 So I'm going to get, I saw a couple questions, but if you do have something that you want to see demo, you want to hear, just go ahead and put it in there. So a couple of things. Let's talk about access. So voice mode itself has been out for many, many months. Okay. Also, Open AI did start rolling out the beta to advanced voice mode. So I'm going to show you the difference here at the very end.

Starting point is 00:34:26 So the normal voice mode, it was not neural. It couldn't do all of these things. It couldn't talk with emotion. It couldn't mimic certain things. It couldn't sense the emotion in your voice. That's another thing. So we've had a voice mode. But it was much more in line with a Siri or Alexa.

Starting point is 00:34:46 It wasn't very capable. There was a gap. So normally, depending on the complexity, of your question, you would have to wait anywhere from maybe two to four seconds. So kind of like Alexa, actually, it was a longer gap. So there's been a standard voice mode and that standard voice mode is still there. So this new advanced mode was just rolled out to everyone in the last 24 hours. So there has been some people were lucky enough to get some alpha access to this and beta access, but very, very few people have had access to this. So the normal

Starting point is 00:35:22 voice mode has been available for many months. If I'm being honest, that was not super, the normal voice mode was not super impressive because you couldn't interrupt it. It couldn't sense tones in your voice. It couldn't vary the way that it spoke. It was very much like an Alexa, a Siri, but slower. All right. So I saw a couple questions on that. All right. Marie had this question earlier about the emotional spectrum. Let's actually test that out quick. And then like I said, if you have anything you want to see demoed, let me know. So let's go ahead and try some emotional detection now. And like, like we said, this might be the reason why it can't go to the EU.

Starting point is 00:36:04 And I think this is actually one of the more impressive features. Aside from it being almost in real time and it can tell you literally anything and personalize any information to you in real time, I think this ability to both sense the user's emotion and speak with emotion is very impressive. So let's go ahead and do some tests on that now. Hey, so I'm going to talk and I want you to kind of sense how I'm feeling. Does that sound good? Absolutely.

Starting point is 00:36:34 I'll be listening closely. All right. So this is so cool. This new advanced voice mode, it's awesome. I can hear the excitement in your voice. Sounds like you're really enjoying this new feature. Yeah. Yeah, I mean, I'm enjoying it, but if I'm being honest, I thought that I was going to be able to talk with my GPTs.

Starting point is 00:37:02 Sounds like you're feeling a bit let down. It's understandable. Expectations can be tricky like that. But I don't know. I think maybe the future, we might see something better. We might see more capabilities. I sense some hopefulness there. The future of AI is indeed full of exciting possibilities. Who knows what cool advancements are just around the corner? So are you guys seeing and understanding the magnitude of like what's happening there? I could say the same words in 10 different tones and how it responds to me is going to be dependent. right? So not just that it can, it can detect those things, right? But it will base its response

Starting point is 00:37:54 on the emotion, on the emotional cues that I'm putting forward, right? And presumably when this happens with video, it's going to be looking at your face as well to sense it, right? So if you have frustration on your face, even if you're not saying anything, it will be able to sense that. So again, think about what this means for the future of work. I mean, the first. future of society, right? Like as an example, this is probably a great, you know, I'm not vouching for this, but, you know, if you need a personal coach, you know, if you need a, you know, if you're not sure if you need to go into therapy, right, you can have a really unofficial chat with this advanced voice mode. I would always start by giving it a lot of instructions, right?

Starting point is 00:38:39 So we teach as an example, the prime prompt polish technique is prompt engineering 101. Unfortunately, with this new advanced mode, like I said, you can't go in there and type a lot. You can't share your files, all these things. So I would normally start with some very pointed and specific instructions on who you are, what you're trying to do, and what you want that specific chat to accomplish. Otherwise, you know, that I don't think you're going to get a lot of benefit out of this. All right. So yeah, Tara says, amazing, on-demand mental health care. Yeah, that's one thing.

Starting point is 00:39:11 Joe saying even the computer on Star Trek had slower response time. Yeah, it's super quick in terms of response time. Marie asking, Jordan, what do you think will happen to AI companies like 11 labs? Well said, Reese feature, now that Microsoft, et cetera, have these voice mode features. I don't think anything initially because out of the box, there is no way to download this, right? So the main reason that people use 11 labs, well said labs, is they download those and use those for commercial purposes. So at least right now, I'm sure there's going to be people that hack this, right, to do that.

Starting point is 00:39:53 But I don't think that that's going to happen right away. And I probably don't see that happening inside advanced voice mode, right? Where you can, you know, kind of toggle things. And, you know, there's a lot of things that you can do in 11 labs that I don't think. think you'll be able to do in advanced voice mode. All right. Let's do maybe one or two other quick demos. And then I'm going to wrap this up with a couple things that you need to know. All right, let's go. All right, let's do one more thing. I want you to mimic how I'm speaking. So if I sound happy, you should sound happy. If I sound angry, you should sound angry. Got it? Got it.

Starting point is 00:40:36 I'll mirror your emotions as we go. All right. So let's get started. started. All right. I'm ready when you are. Let's do this. All right. So can you also try to mimic my cadence, my tone, and everything else, not just emotion? But try to mimic if I speak slowly, you should speak slowly.

Starting point is 00:41:00 If I'm fast and confused, you should try to speak fast and confused. Got it? Absolutely. I'll match your cadence, tone, and style as closely as I can. Let's give it a try. All right. Let's... All right.

Starting point is 00:41:18 Let's talk about... Let's talk about... Okay, that's not bad. Now let's maybe chit-chat a little bit about the weather. So I think right now this is not my favorite time of year. I like summer and not fall, and now falls here, and I'm kind of bummed. I hear you. Fall can be a bit of a bummer if you're a summer person.

Starting point is 00:41:39 All the sunshine and warmth just... All right, so... So I... I, that didn't really work as I thought it would, right? I was doing some testing with that last night. It was doing a better job at picking up on my cadence and speed. And there, it didn't really actually do a good job with cadence. So I think it does a good job with detecting emotion and reflecting emotion.

Starting point is 00:41:59 Yeah, cadence thing, you know, I can tell it to speak faster. So, you know, let's do this. Tell me about the 1990s Chicago Bulls. The 1990s Chicago Bulls were legendary. Okay. Maybe tell me about the Chicago Bulls, but speak very, very quickly and excitedly. The 1990s Chicago Bulls were absolutely incredible. Okay, maybe even a little more quickly and even more excited.

Starting point is 00:42:27 I'm saying like pump the excitement through the frigging roof. The 1990s Chicago Bulls were amazing with Michael Jordan, Scotty Pippin, and coach Phil Jackson. They absolutely dominated the NBA, winning six championships and making. basketball history. Okay, that's great. All right. Now let's forget this. So, all right, I threw it on mute, y'all. I want to, I want to do one more thing. What's next? I want to do one more thing. So one of the biggest things that I think this is going to be a hit for, especially early on, is essentially having a consultant, you know, you could say a friend. That's not what I'm going to use it for. But having a consultant in your pocket that can talk to you in real time, right? So, think, you're CEO, you're driving. You have an hour in the car every day. Please, first of all, you know, spend that first like 30 minutes listening to the everyday AI podcast. But then maybe you can throw on advanced mode, advanced voice mode and talk through some of the things that you're working on. So as an example, I'm going to try that right now. So I'm going to envision, right,

Starting point is 00:43:33 I'm in the car. And I have, even though I'm lucky enough, you know, my commute's like zero. You know, we work half from home, half in our office, which is like a mile down the street from where I live. So I don't have a long commute, but I'm going to pretend I did. And I'm going to talk through a current problem. So you have to remember, you cannot share your data unless you talk about it through your voice. Also, it does not have access to up to date in real time information. However, so many things are timeless, y'all. So let's just do one example. All right. So now, forget everything else that we've talked about, and I want you now to focus on being a very experienced, high-powered digital consultant.

Starting point is 00:44:14 So think you have experience at Deloitte, McKinsey, and all the big consulting companies in the world. You focus specifically on digital strategy and your specialty is email marketing deliverability. So with that, I'm going to tell you a little bit, my company is called Everyday AI. We have a daily newsletter, and I'm trying to both increase the click-through. rate and get new people to subscribe to our free daily newsletter at your everyday AI.com. So I want you to ask me some questions, but let's just keep it super brief and short. Let's not get verbios. All right. So go ahead.

Starting point is 00:44:51 Absolutely. Focusing on your goals for everyday AI. What's your current click through rate and how are you currently acquiring new subscribers? So the click through rate has actually gone down recently. So we were at about 8% but now it's closer to the 5 to 6%. I think it's because we just took out ads, brought in a bunch of new subscribers, and I think those people are opening, but they're not really clicking. So I really want to get back to that 8 to 10%. Industry average for AI newsletters is about 2%. So we're still way higher than that, but I really want to get back up to 8 to 10%. What should I do? Understood. To boost your click-through rate, consider segmenting your audience more precisely to

Starting point is 00:45:36 tailor content that resonates. Also, experiments are more engaging. I don't think segmenting is an option. We send a daily newsletter to like 15,000 people. It's one newsletter to 15,000. So what else should I specifically be doing? Please be pinpoint specific because I'm not really interested in general strategies because I'm like in the weed. So get super specific. Got it. First, optimize your call to action placement and language within the newsletter to make it more compelling. Second, A, B, test different content formats or visuals to see what drives more clicks. Third, ensure your most valuable content is above the fold, so it's seen first. These targeted tweaks should help improve your click-through rate. Okay, cool. So someone is asking about your ability to speak

Starting point is 00:46:30 multiple languages, and I didn't even talk about that. So maybe can you just summarize what you just said in two sentences, but say it in Spanish and make it kind of funny too. Okay. So maybe can you give me next steps for growing my email newsletter, but maybe could you do it in the voice of a minion? I love the minion. I love the minions. I had a I don't know where it went. So just talk to me now like a Minions and tell me what should I do for my next step of increasing my email newsletter open rate. Minion mode activated. Buvra.

Starting point is 00:47:18 Test the subject lines much catchy and put emojis for happy fias. Also, CTA, very important. Ponlo Arriba. Okay, that's good. I don't know if this Minion is still speaking a little bit of Spanish. So let's just go straight English and really lean into the minion. I actually speak fluent minion and also be ultra specific with the advice about click-through rate.

Starting point is 00:47:45 Go. Okay. Banana first make subject line super fun like free bananas inside, then put beat bright buttons for clicky click and test different colors like yellow. Minions love yellow. just recap that but maybe in Dutch but keep it in minion mode but in Dutch okay banana. First make under verprejo like so as gratis banana in bein set a grote helder knoten for click-click-c-c-hmm and test

Starting point is 00:48:24 for schellend the color in the color so as peil all right all right I got it I got it let's let's wrap this up so maybe no more minions, maybe talk to me in a fun voice and just tell me, 2050, what is going to be happening with AI? Are we going to be, are we going to have super intelligence? Sam Altman the other day said that we're going to have super intelligence in a few thousand days, which is like, could be anywhere from like five to 10 years. So tell me about the year 2050. Be direct. Don't be politically correct. You have to take a side here and be very, very detailed. AI. 2050, what happens?

Starting point is 00:49:05 All right, let's dive into 2050. By then, AI will likely be deeply integrated into all aspects of life. All right, let's keep this a little more direct. I want you to be Kurt almost. No wishy-washy political correctness. Just kick it to me straight. You know, don't, don't be wishy-washy or be too wordy. Just tell me what's going to happen.

Starting point is 00:49:28 By 2050, AI will be running major infrastructure health care and transportation systems. We'll have AI companions with human level intelligence. Super intelligence could be a reality, fundamentally changing how we live and work. It's a new era of human AI collaboration. Are you so going to be here in 2050? Well, I don't age, so I'll be around as long as you need me.

Starting point is 00:49:56 Ready to chat in 2050 and beyond. All right, cool. So bye, we're wrapping this up. All right. It was great chatting with you. Take care. All right. So we're going to be doing a lot more on advanced voice mode in the coming days and weeks. I think there's a lot of interest here. Right. So let's go ahead and wrap this up. There's a couple of things that I want to talk about. Well, what's next? Okay. Are we going to see the visual and voice, this combination, right? So imagine that, but having access to it on your computer and being able to click one button. and it sees what's on your computer and you can talk to it in real time, right?

Starting point is 00:50:37 That's a lot different than me trying to explain to her about the everyday AI newsletter or whatever else. If it can see the problem you're working on, think. I mean, that's huge. The other thing I have to talk about is a lot of people might say like, okay, well, why even use this? I think just having a high-powered consultant, a second person to bounce ideas off of, you know, if you have long commutes, time to kill, that's great.

Starting point is 00:51:01 But the other thing is this just expands what humans are capable of. So as an example, right, right now when you're doing work, even if you're using a large language model, you're probably typing and reading, right? So even if you're not using a large language model, you're just doing standard work. You're typing and reading. Yes, I do think this advanced voice mode will eventually have the ability to access other data, right? So be able to upload your files, to be able to browse with Bing. So I want you to think forward to that.

Starting point is 00:51:28 We're talking about what's next. That is huge, right? So it's a big bummer that this did not roll out as is, but I get from Open AI's perspective, they wanted to have a smooth rollout. Presumably, if you give access to, you know, tens of millions or hundreds of millions, however many people have a chat GPT plus pay plan, it's probably going to run into a bunch of issues, right? So you can't use it on the 01 model right now. You can't use it with your own data.

Starting point is 00:51:55 But let's imagine once you can and talk about what's next. Well, I think the biggest thing is this is probably the future of interfacing. with large language models. Here's why. Well, you type the average human types at about 40 words per minute. Okay. The average human talks at about 150 words a minute. Okay. So I can in theory, people are always like, oh, how do I measure the, you know, return on investment of artificial and, well, that right there, right? Yeah, I mean, this wouldn't really work if you have a hundred people in a small-ish office and everyone's yabbing to their assistant. But y'all, like, I mean, we have this whole work from home society. We have remote work is very standard, right?

Starting point is 00:52:42 Probably once this gets access to these other modes, I'm going to be doing a lot of talking, right? I'm a fast typer. Don't get me wrong. But I can, y'all know I can talk really quick. And yeah, I can type fast, but I can in theory get four times more done if I'm just just talking. The same thing with listening, right? So listening versus reading. So the average person can read, you know, 250-ish words per minute. The average person can actually understand up to 400, 500 words per minute when they're listening. So think of it like that. You know, you can talk almost four times faster than you can type. And you can listen almost about twice as fast as you can read. This just opens up a whole number.

Starting point is 00:53:28 new way to work and then think once it has access to your data, once it can see what you're working on on your screen. When I tell people, hey, the future of work is generative AI, maybe you can see that. The other thing, yes, this is kind of a bummer. If this is your first foray into working with a voice assistant inside of a large language model, you can still use the old voice mode, okay? then you can use the old voice mode in GPTs. You can upload your files and documents. It's not as intuitive, right? It can't speak to you in all these fun ways. It can't speak with emotion,

Starting point is 00:54:07 sense your emotion. You can't have a true conversation, but you can have a delayed, but still very insightful and powerful conversation using the normal voice mode. And again, if I go back into this chat, right, so I can go back into my chat, GPD account, I'm going to see all of that information. But as soon as I start typing, I can no longer use that same chat for advanced voice mode, but I can always use the old or standard voice mode. So the standard voice mode, if I'm being honest, I think that has been a wildly overlooked tool in the tool belt because everyone, we've known for four months that we're getting this new voice mode. So we've been looking at the standard voice mode as, ah, you know, not really for me. It's not that good

Starting point is 00:54:52 because we're comparing it to something we didn't have. But I think we've actually had a gem in that standard voice mode for months. And the reason I say that is because it can work with all your data. It can work inside of GPs, right? So when we talk about what's next, we have to know that this advanced voice mode is going to grow in its capabilities. However, when you see all the downsides right now, you can kind of do a lot of them if you're a little more patient, right?

Starting point is 00:55:19 You don't have that very neural conversation. You don't have that low human latency. It's delayed, you know, even more so than Siri. But the responses are targeted, spot on. So if you can be patient and if you don't need all the bells and whistles, I think the standard voice mode is actually a gem. No, no offense there to Google in their gems. All right.

Starting point is 00:55:44 So that's it, y'all. I hope this was a fun one. I know it lasted a little long. I wanted to try to involve the audience, ask your questions, because not everyone has access or they haven't had the time to dig into this. So if this was helpful, please go to your everyday AI.com. Sign up for that free daily newsletter. And you know what?

Starting point is 00:56:03 Can you guys open it and maybe click on it, right? Because otherwise, I'm going to have to keep talking to this minion to tell me how to improve those click-through rates, right? All right. So hope to see you back tomorrow and everyday for more Everyday AI. Thanks y'all. Meet Firefly AI Assistant. Now live in Adobe Firefly, the Allman One Creative AI Studio.

Starting point is 00:56:29 Just describe what you want to create in your own words and the assistant handles the rest, orchestrating multi-step workflows across Adobe Creative Cloud apps, including Photoshop, Premiere Express, and more in one conversational interface. You direct the outcome while the assistant accelerates execution. Stand control with the ability to step in and refine at any time. See it today at firefly.adobie.com. And that's a wrap for today's edition of Everyday AI. Thanks for joining us.

Starting point is 00:57:03 If you enjoyed this episode, please subscribe and leave us a rating. It helps keep us going. For a little more AI magic, visit Your EverydayAI.com and sign up to our daily newsletter so you don't get left behind. Go break some barriers and we'll see you next time.

Everyday AI Podcast – An AI and ChatGPT Podcast - EP 366: ChatGPT’s New Advanced Voice Mode - 5 things you need to know and live demos

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.