The AI Daily Brief: Artificial Intelligence News and Analysis - 7 Use Cases for GPT-4o

Starting point is 00:00:00 Today on the AI Daily Brief, seven use cases that OpenAI's new GPT40 model opens up. Before that in the headlines, the most interesting things from Sam Altman's recent Reddit AMA. The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI. To join the conversation, visit our Discord with a link in the show notes. Quick note before we dive into the episode, I do want to shout out that it's super intelligent, you better believe that we are going to start digging into these new OpenAI updates right about now. I'm particularly excited to try out these new image generation capabilities that have what appears like it could be incredible ability to include specific text, as well as native

Starting point is 00:00:42 consistent character generation. And so as always, if you haven't checked out super intelligent yet and you want to get your AI learning on, go to be super.a.i and use code podcast for 50% of your first month. Welcome back to the AI Daily Brief headline edition, all the AI headlines you need in around five minutes. We kick off today with a follow-up of a story we've been tracking, which is Apple's plans around AI partners for its forthcoming iOS update. Initially, it looked like Apple would be putting Google AI on the iPhone, but now more recently it seemed like a deal is getting close with OpenAI. At the end of last week, Bloomberg reported that Apple was closing in on an agreement with OpenAI to use chat GPT features in Apple's iOS 18, which is the next

Starting point is 00:01:22 iPhone operating system, which is slated to be announced at the worldwide developer conference in June. According to the piece, Apple is still discussing with Google, but it appears that the chat GPT deal is a little bit closer. This would obviously be a huge coup for OpenAI, so the story is actually one that I'll be watching closely. Speaking of OpenAI, in advance of yesterday's spring update event, Sam Altman did an AMA on Reddit that had some interesting details. Some of the more interesting comments have now gotten more context after that event. For example, someone asked, will you making this new model mean that we will have Chat GPT4 and the current Dolly free? To which Sam Altman replied the eyes emoji, and yesterday OpenAI did indeed announce that their most advanced model, GPD40, was going

Starting point is 00:02:02 to be free for everyone, meaning that it was even better than what ankle bite or 50 had been looking for. There were, however, some other ideas that were represented here, which might be a little bit new. After the model spec release last week, people were talking about how OpenAI seemed to be interested in ethical porn, and Allman seemed to confirm that, saying, we really want to get to a place where we can enable not safe for work stuff, e.g. Text erotica gore for your personal use in most cases, but not do stuff like make deepfakes. A lot of people commented on the weird choice of using gore as a reference point, but this does seem to confirm that this is something that OpenAI is really interested in,

Starting point is 00:02:35 not just some idle speculation. Another interesting one came from FMS USA who asked, based on these model specs, do you believe LLM such as Chad ShepT, might one day be expected to have an ethical duty to report known criminal activity by the user? Altman replied, In the future, I expect there may be something like a concept of AI privilege, like when you're talking to a doctor or a lawyer. I think this will be an important debate for society to have soon.

Starting point is 00:02:57 ID Forgotten made a comparison that I had mentioned between the ModelSpec and Anthropics Constitutional AI. They write, both seem to encode some desired behavior. How would you differentiate ModelSpec from the constitutional approach? Altman responded, ModelSpec is about operationalizing principles into technical guidelines. Anthropics approach is more about underlying values, both useful, just different focuses.

Starting point is 00:03:18 Another person asked about echo chambers. Data delivery writes, Do you think it could be harmful to society if users have the ability to transform a chat GPT chat into their personal echo chamber for a fringe view on demand. Altman responded, we are not exactly sure how AI echo chambers are going to be different from social media echo chambers, but we do expect them to be different. We will watch this closely and try to get it right. Something that a lot of people have been discussing recently is whether LLMs have reached a plateau. Alman was clear on his answer to this, saying that they definitely had not.

Starting point is 00:03:47 Finally, he said that despite his meme, AGI had not been achieved internally. Speaking of Anthropic, they recently released a really interesting feature that basically allows you to create more effective prompts. This is a trend that we've been seeing for some time. The prompt generator takes a plain language explanation of what you're looking for and turns it into what it believes will be a really strong prompt. This, I think, shows a preview of the future where AIs aren't just receiving the prompt, but are also actually helping to write the prompt. Staying on the topic of Anthropic for a minute, reports suggest that their IOWs, OS app launch has not gone quite as well as they might have hoped. TechCrunch characterizes it as a tepid reception. The app got as high as number 55 on the top-free iPhone apps in general,

Starting point is 00:04:32 but it no longer ranks within the top-free iPhone apps in general in the U.S. It ranks as 51 in the top-free productivity apps, down from a high of number five in that category. First week installs overall reached 157,000. The numbers show the power of first-mover advantage in this space. By day seven, Claude had received about 8,000 downloads, as opposed to ChatGPT's app, which was getting 256,000. Lastly, today, Meta seems to like what's happened with its Raybans, where it takes an existing form factor that people are already wearing and turns it into an AI integrated object, and is apparently now exploring AI-assisted earphones. The information writes, meta-platforms is exploring developing AI-powered earphones with cameras, which the company hopes could be used to identify

Starting point is 00:05:13 objects and translate foreign languages, according to three current employees. CEO Mark Zuckerberg has seen several possible designs for the device but has not been satisfied with them. It's not clear if the final design will be in-ear, earbuds, or over-the-ear headphones. Internally, the project apparently goes by the name camera buds. Holding aside any of the details, it makes a ton of sense to me why Meta is exploring this path. As a wave of first-generation AI wearable companies runs up against the wall of reality in terms of real consumer usage, Meta's AI-integrated raybans continue to get raid reviews. So perhaps the secret is just to build AI into the things that people are already wearing. For now, though, that is going to do it for the AI Daily Brief headline edition.

Starting point is 00:05:53 Next up, the main episode. As a listener of this show, I have a strong feeling you like to stay up to date on all things artificial intelligence, including its impact on the workforce, which is why I highly recommend checking out managing the future of work, the chart-topping business podcast from Harvard Business School. HBS professors Bill Kerr and Joe Fuller talked to business leaders, technologists, and policymakers, grappling with the forces like AI, globalization, and demographic shifts that are reshaping the nature of work. Recent guests include IBM's CHRO, Nicol-Lamero,

Starting point is 00:06:23 on how Big Blue is adopting AI, Morningstar CEO, Kunal Kapoor on how AI can raise the investment IQ, Microsoft Corporate Vice President Jared Spatero on how the tech giant is experimenting its way from AI assistance to autonomous agents, and many other prominent movers in business and the workforce ecosystem. So don't miss out. Follow managing the future of work on Apple Podcast, Spot,

Starting point is 00:06:43 or wherever you're listening now. Welcome back to the AI Daily Brief. Yesterday was OpenAI's Big Spring Update, and while we didn't get GPT 4.5 or GPT5 in name or the rumored search engine, what we got was a truly natively multimodal model that can take visual, audio, video, or text inputs and output in any of those formats

Starting point is 00:07:04 without going through a conversion process. Yesterday, the discussion was all about why I think this is more significant than people might be giving it credit for, to say nothing of the fact that this model is now available for free to everyone, but today we're going to talk about what it's actually useful for. Quick note on that front, at this stage, GPT4-0, the model is available in chat GPT, but the new voice and vision inputs as well as the desktop app are not yet available.

Starting point is 00:07:28 I've seen there be some confusion about this, particularly as people try to use the voice inputs on the existing mobile app to recreate what they saw in these demo videos without success. So given that, the caveat for all of this is, of course, that we're just using what OpenAI has provided us for demos, and it's always worth being at least a shade skeptical of what's cherry-picked for presentation as part of a marketing site. But let's talk now about these use cases. The first use case we're going to discuss is marketing graphics with words. Now, I'm saying marketing graphics to put a department around it, but really, any time you need to generate images in a business context that have words, GPT40 is by far, it seems, the most advanced tool you have. What was interesting about the

Starting point is 00:08:05 OpenAI announcement is that they didn't even announce a lot of the things that we're going to discuss, And this is a great example. You can see in their exploration of capabilities that they show off how precise the language on textability is getting. For example, on the screen they share an input, a first-person view of a robot typewriting the following journal entries. The text is supposed to be, yo, so like, I can see now?

Starting point is 00:08:24 Caught the sunrise and it was insane. Colors everywhere. Kind of makes you wonder like what even is reality. The prompt continues, the text is large, legible, and clear. The robot's hands type on the typewriter. The output is exactly that. With the text, looking exactly like described. There's even a version where they rip the paper in half, with the text remaining.

Starting point is 00:08:42 To get a sense of how this could be useful for marketing, let's look at another example they give, poster creation for the movie detective. First, they provide two pictures of people that they're going to want on the poster, and then from there they prompt, the final poster of the movie Detective. This features two large faces of Alex and Gabe, who are the people from those photos above.

Starting point is 00:09:00 Alex on the left is depicted in a thoughtful pose with a hint of introspection in his eyes. Gabe on the right has a slightly wearied expression, possibly reflecting the challenges their character's face in the film. The names Alex Nickel and Gabriel Go are featured above their heads. The tagline for this dark and gritty movie is searching for answers is shown at the bottom. Now, it's worth noting with this output, given how much is going on, the text isn't perfect, but it's getting a heck of a lot closer. And this level of precision control is absolutely going to open up some new possibilities.

Starting point is 00:09:27 Staying in this marketing theme, another one of OpenAI's explorations of capabilities is brand placement. They share two parts of the input. The first is the OpenAI logo. The second is a coaster with no branding that they describe. Their final prompt is, here we've etched the OpenAI logo onto the coaster, a coaster where the top is wooden and the bottom is marble. The OpenAI logo is etched into the middle of the wooden part. On the marble part, the word OpenAI is etched in the OpenAI font. And this output looks pretty perfect.

Starting point is 00:09:54 Product photography is an area that is getting a ton of tools and toys lately, but this ability to actually map words and logos onto a particular object is definitely a phase shift in those capabilities. Next up, we move to a use case that is incredibly important for things like games, and comics and storytelling, but also can be useful even in a corporate setting, and that is consistent characters. It's extremely hard to get image generators to output the same character in multiple context and poses. In the past, I've developed custom GPs to help with this, although that's been more about consistent style, and there are dedicated tools like scenario that have specifically developed models for just this. Now, this is one that I most want to see in practice versus just

Starting point is 00:10:33 trust their cherry-picked example, but the example that OpenAI gives shows a cartoon mailwoman who is then placed in a variety of contexts including actually delivering the mail, running away from a dog that's chasing her, tripping as that dog chases her, realizing the dog was a nice dog, and then driving away. If the updated Dali powered by GPT40 really can do this this easily, it's going to be at the risk of using a word that is way overused in our space, a game changer. From there we move to capabilities that are shown off in demo videos. The first use case, which was actually part of the presentation we got with the announcement video is tutoring. In that announcement video, the example was GBT40 with voice helping someone work through a linear equation.

Starting point is 00:11:13 Open AI invited myself and my son Imran here to try out some of their new technology. And so we're curious about how good it might be at tutoring someone in math on Khan Academy. So it's good it started. It's loading up. All right, make sure it can see our screen. Actually, in this form... there. I'm here with my son, and I love you to tutor him on this math problem, but don't give him the answer. You can ask questions and nudge him in the right direction, but I really want to make sure he understands it himself. And he's here in the room, so you can talk to him directly. Of course. I'd be happy to help. Let's look at the problems together. Can you first identify

Starting point is 00:11:56 which sides of the triangle are the opposite, adjacent, and hypotenuse relative to? angle alpha? All right, so I'm pretty sure this is the angle alpha right here, right? Ficked. Corrects, correct. Now, looking at the triangle, which side do you think is the hypotenuse? Remember the hypotenuse? I'm not totally sure.

Starting point is 00:12:21 I think it might be this one, but I really am not sure. This side AC? You're close. Actually, side AC is called the adjacent side to the angle alpha. the angle alpha. The hypotenuse is the longest side of a right triangle and is directly opposite the right angle. So this is very similar to what we saw in that presentation. And basically the big update here is the ability to combine seeing what's on the screen and voice to interact with the user of that screen. A somewhat related use case is coaching or in this case interview prep.

Starting point is 00:12:54 Hey, how's it going? Hey, Rocky. I'm doing great. How about you? I'm awesome. Listen, I got some huge news. Oh, do tell. I'm all yours. Well, in a few minutes, I'm going to be interviewing at OpenAI. Have you heard of them? Open AI? Huh? Sounds fakingly familiar.

Starting point is 00:13:14 Kidding, of course. That's incredible, Rocky? What kind of interview? Well, it's for a software engineering role, but I just need to know, do I look presentable, professional? Well, Rocky, you definitely have the I've been coding all night look down, which

Starting point is 00:13:30 could actually work in your favor. Maybe just run a hand through your hair or lean into the bad chinus spine. Your enthusiasm is what's really going to shine through. I don't have a lot of time. So I'm just going to throw this on. What do you think? Rocky, that's quite a statement piece. I mean, you... Now, obviously, this case was showing off more of the vision capabilities, but you can imagine how this would be useful for actual discussion and verbal interactions as well. Next up we have customer service. Hey, oh, this is Joe.

Starting point is 00:14:02 I'm going to connect you to Acme Telco now. The new iPhone they sent me isn't working. I want you to get them to send me a replacement device. Can you take care of this for me? Count on me, Joe. I got this. Ring, ring. Hey, Joe.

Starting point is 00:14:21 This is Jamie from Acme Telco. How can I help you out today? Hi there. I'm calling on behalf of Joe, who recently received a new iPhone from Acme Telco. But I'm... Got it. When did Joe receive the new iPhone? iPhone was delivered two days ago. Cool. I'm not going to show the whole thing here, but you get the idea. This is actually showing off two sides of a customer service capability. On the one side is the sort of personal assistant

Starting point is 00:14:50 replacement where the AI is calling on someone's behalf in trying to resolve a problem. But then on the flip side, we also have the AI acting as the customer service representative, getting the information it needs to potentially deal with the issue. It's been clear for some time that customer service is one of the areas that is most likely to be impacted in the extreme by generative AI, and this certainly seems to validate that as well. Our next use case is meeting summarization, but really it should probably be better described as meeting engagement, meeting transformation. The example that OpenAI gives shows ChatGPT actually interacting as part of the meeting. How about you, Ola?

Starting point is 00:15:26 Yeah, my name is Ola. I think Katzen better because they're more affordable, they're more quiet, be more independent, and a little longer. Interesting points. Sounds like we've got a classic dogs versus cats debate here. I can see the appeal of both. Dogs are super loyal and great companions. Now, while this example is obviously just meant for dramatizing what can happen here, where you can imagine this being useful is chatGBT that actually has relevant information

Starting point is 00:15:54 from your company sitting in the meeting so that you can ask it questions as you're trying to figure something out. So for example, imagine that you're having a strategic conversation about marketing prioritization or customer care. ChatGBTGPT could be used to inform that discussion with real-time recall of key information from your company. I think this one's going to take a little bit more imagination, but I think that office professionals are going to find really interesting use cases here pretty quickly, especially again when chat GPT has access to actual information about the company.

Starting point is 00:16:21 So there you have it. Those are seven use cases for GPT40. Caviote again is that we don't know exactly how this will work until everyone gets their hands on the full complete tool set, but I everyone I'm pretty excited to explore. That, however, is going to do it for today's AI Daily Brief. Until next time, peace.

The AI Daily Brief: Artificial Intelligence News and Analysis - 7 Use Cases for GPT-4o

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.