The AI Daily Brief: Artificial Intelligence News and Analysis - ChatGPT Launches Voice, Midjourney Launches 6.1

Starting point is 00:00:00 Today on the AI Daily Brief, OpenAI has finally started rolling out advanced voice mode for chatGBT. Before that in the headlines, Microsoft's latest earnings and Wall Street dealing with the timeline for AI ROI. The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI. To join the conversation, follow the Discord link in our show notes. Welcome back to the AI Daily Brief Headlines edition, all the AI Daily News you need in around five minutes. We are in earnings season. and of course the big question for markets is around AI and whether it is going to be a long-term bet exclusively or whether we're going to start seeing some ROI for these big investments

Starting point is 00:00:42 that the hyperscalers are making in AI infrastructure on a quicker time scale. This has been one of the most ongoing conversations at the AI Daily Brief for months now. Because the big tech companies are so integral to this new technology shift, Wall Street is involved in a new technology shift much earlier than it might otherwise have been. or certainly that it has been in the past. Unlike innovations in areas like mobile, AI is under a financial microscope right away because of the fact that Microsoft and Meta and Google and all these companies are spending such an extraordinary amount of money to build out their ability to compete in that space. There have been a growing swirl of questions around whether that investment

Starting point is 00:01:20 is likely to pay off. And frankly, I think that in many ways it's forcing Wall Street to operate more like a venture capitalist, which is just a very uncomfortable position for it to be in. The latest information to come to this argument is around Microsoft's just announced earnings, which have suggested that, yes, the payoff from AI is not going to be immediate. It is a long-term play. So the big news for Microsoft was that its capital spending had risen 77.6% to $19 billion during its fiscal fourth quarter that ended on June 30th. Cloud and AI-related spending accounted for nearly all of that expenditure. For the entirety of fiscal 2024, capital spending totaled 55.7 billion. According to Reuters, group CFO Amy Hood said the spending was necessary to support

Starting point is 00:02:01 demand for AI services, and the company was investing in assets that, quote, will be monetized over 15 years and beyond. Of course, it's not just a question of cost, but a question of revenue. Azure growth ended up being disappointing to analysts who had expected it to come in around 29.7% compared to the 28 to 29% than Microsoft predicted, said Daniel Morgan, senior portfolio manager at Sinovis Trust, the street doesn't have a lot of patience. They see you spending billions of dollars and they want to see a pickup of revenue in that amount. If these companies do not hit it out of the ballpark and are far better than the estimates, then they are going to be knocked back. And indeed, after the spending forecast was released, shares fell 7%, although it did recover slightly in after

Starting point is 00:02:37 hours trading. In terms of AI-specific growth, AI services accounted for a slightly larger portion of the increase in revenue in Azure, accounting for 8 percentage points compared to 7 percentage points the previous quarter. Overall, Azure AI is now used by more than 60,000 customers, which is up nearly 60% year over year. One interesting little nugget came from the information who writes that TikTok spending drove Microsoft's AI business. According to the information, as of March, TikTok was paying nearly $20 million per month to buy OpenAI's models through Microsoft. That represented close to 25% of the total revenue Microsoft was generating from that business line. Now, of course, as the information points out, the risk is that the more concentrated AI sales are,

Starting point is 00:03:17 the more vulnerable to a shift the business is. According to the information, sources, Microsoft hopes other more reliable corporate customers will make up for that risk. Walmart and Intuit were each paying Microsoft several million dollars a month to access OpenAI's models during the first quarter as a for example. Another AI detail as shared by CEO Satya Nadella was that the number of customers that had bought 10,000 or more co-pilot subscriptions doubled between March and June. Financial services firms had been among the biggest buyers. Overall, I don't think that this really changes anything. I think that on the one hand, AI is showing up in revenue. It's not like it doesn't have present.

Starting point is 00:03:51 at all, but whether it can meet Wall Street expectations is an entirely different question. I think we might be in a process of slightly repricing expectations when it comes to AI, and if Wall Street can do that in a discipline and incremental way, that would be a heck of a lot better than it suddenly deciding that it was just in one big bubble. Now, moving on, yesterday we discussed how Meta had updated AI Studio to allow anyone to create their own AIs and to allow creators to create AI avatar versions of themselves to interact with fans, but at the same time, it has scrapped its celebrity chatbots. Less than a year ago, Meta had announced these celebrity chatbots with people like Mr. Beast,

Starting point is 00:04:25 Tom Brady, Charlie D'Milio, Paris Hilton, and more, but they were never very popular with users. Now, frankly, this doesn't surprise me at all. In fact, I would have been more surprised had these things really caught on, but of course, I'm not exactly the target demographic, so I was willing to suspend my disbelief. I think that the new approach, where people get to create their own AIs, and creators can use versions of themselves to interact with fans in a recurring way, seems a lot more likely to succeed than this weird celebrity thing. Finally, today,

Starting point is 00:04:52 Perplexity, which is both an increasingly beloved but also controversial AI tool for research and searching more broadly, has announced a Publishers program. This is noticeable because they've recently dealt with some lawsuits from publishers who argue that Perplexity is infringing upon their copyright. The first batch of partners

Starting point is 00:05:08 for the Perplexity Publishers program includes Time, Der Spiegel, Fortune, Entrepreneur, the Texas Tribune and WordPress.com. The deal includes revenue sharing, access to Perplexity's APIs, and Enterprise Pro for all their employees. Importantly, Enterprise Pro is being offered for free for a year for all of their partners. To me, this is another example of how even as the courts raised to try to figure out these

Starting point is 00:05:30 questions of copyright in the new AI context, the market is trying to sort them out as well. That's going to do it for today's AI Daily Brief headlines. Next up, the main episode. Today's episode is brought to you by Venice. Venice is a private, uncensored generative AI app. It accesses open source models to enable text image and code generation without the fear of being spied on or having your data exploited. Discuss anything with Venice without concern about it being monitored, sold, or given to

Starting point is 00:05:56 advertisers and governments. Venice is different because your conversations and creations are kept securely within the browser, never stored or accessible by Venice. Unlike other AI apps, Venice won't tell you what's okay to say or not. Venice won't patronize you. It simply provides direct access to machine intelligence, no topics are off limits, no ideas or taboo. With Venice, you're in control of the AI as you should be. Pro-subscription are available for $49 a year or $8 per month. AI Daily Brief listeners receive a 20% discount on Venice Pro. Visit venice.a.i slash NLW and enter the discount code NLW Daily Brief.

Starting point is 00:06:29 That's NLW Daily Brief, all one word. Today's episode is brought to you by Super Intelligent. As you guys know, Super Intelligent is a platform we are building to help everyone, individuals and teams, maximize their use of AI. We help you figure out how to use AI tools, as well as what to use, AI4. And this is really important. The whole goal of superintelligent is not just to give you tutorials and lessons, but to show you how other people like you are actually getting value from AI right now. For those of you who are still out there working, learning, and grinding deep in the

Starting point is 00:07:03 summer, I'm excited to share our best offer ever. If you sign up with code year 50 right now, you will get 50% off the already reduced annual price. That means you'll get access to super intelligent for a full year for less than $100. Again, that code is year 50 for 50% off the already reduced annual fee. The year 50 code will expire at the end of July. So if you are interested in this, head on over to B-Supert.a.I. And can't wait to see you on the platform. Welcome back to the AI Daily Brief. Today we get to have one of the most fun types of conversations for this show, which is new features. Yes, today we get to go away from policy. and regulations and societal issues, and just focus on the really cool things that AI is letting us do.

Starting point is 00:07:51 The first story around those lines is that after months of anticipation, OpenAI has finally started rolling out its much-anticipated voice mode. Now, voice was shown off back in May, but pretty soon Open AI said that it wasn't quite ready for prime time yet, and they just weren't comfortable releasing it in the state that it was in. Of course, you also might remember that event for the controversy it generated around having a voice that sounded to many like Scarlet Johansson from her, but that hasn't stopped people from wanting to get their hands on it. As the Verge points out, at OpenAI's event, the new voice mode appeared to be noticeably more capable than ChatGPT's current voice mode. On stage, OpenAI employees were able to interrupt

Starting point is 00:08:27 and ask the chatbot to tell a story in different ways, and the chatbot took their interruptions in stride to adjust its responses. Part of the delay, it appears, was that OpenAI involved more external red teamers during the extended month to try to release a version that is more safe. OpenAI says that they've also, quote, added new filters that will recognize and block certain requests to generate music or other copyrighted audio. So right now, the advanced voice mode is slowly rolling out to paying ChatGPT Plus users. And even within that, it's only a smaller subset that have access. OpenAI said that users in this alpha test will receive an email with instructions as well as a message in their mobile app. And while the rollout has started, it seems like it's not going to be

Starting point is 00:09:03 particularly aggressive. OpenAI writes will continue to add more people on a rolling basis and plan for everyone on Plus to have access in the fall. Importantly, this also does not include the video and screen sharing capabilities that were also shown off at that event back in May. Still, the first reviews are rolling in. Manuel Sain Silly writes, it feels like FaceTiming a super knowledgeable friend, which in this case was super helpful, reassuring us with our new kitten. It can answer questions in real time and use the camera as input too. Another example from the same person was real-time Japanese translation with an original Pokemon Nintendo game. I have access to this amazing Pokemon Yellow original Japanese version, but I can't really speak Japanese, so could you help me translate the text?

Starting point is 00:09:47 That's awesome! I'd love to help with translations. Go ahead and show me the text. And we'll work through it together. This is the classic Professor Oak introduction. He's saying, hello there. Welcome to the world of Pokemon. Ready for your adventure? Professor Olk is ready to guide you through your journey. Cristiano Giardina says, it's very fast. There's virtually no latency from when you stop speaking to when it responds. When you ask it to make noises,

Starting point is 00:10:16 it always has the voice perform the noises with funny results. It can do accents, but when speaking another language, it always has an American accent. And then he shares a video of ChatchipT acting as a soccer match commentator. Hey, I want you to act like a sports commentator commenting a soccer match. And I want you to really get into it and scream when a goal is scored. get really excited. All right, folks.

Starting point is 00:10:39 We're in the final minutes of this intense match. The home team is pushing forward, passing with precision. The strikers got the ball. He's weaving through the defense. He shoots. Go! So you can see the emotion modulation is something that makes this more advanced than other voice platforms than we've seen before.

Starting point is 00:10:59 Aaron Tang says, ChatGPT's low latency plus interruption voice is awesome. I'll be relearning Cantonese soon to make my mom happy. OpenAI is helping all relearn so much at any speed. I think translation is likely to be the thing that people try first, as it's one of the most obvious and potentially game-changing uses of this feature. Anyways, I will be keeping track to see whether I get this email, and even in advance of getting access myself,

Starting point is 00:11:21 I will also be keeping an eye out for what people are doing with it. Next up, we move over to a Mid Journey update. Mid Journey has just released version 6.1. Now, this is Mid Journey's first update in quite a while. version 6 came back around the beginning of the year. While it's been less than 24 hours, a lot of people are really impressed so far. So here's how Mid Journey describes what's new in version 6.1. More coherent images, i.e. arms, legs, hands, bodies, plants, animals, etc. Much better image quality with reduced pixel artifacts, enhanced textures, skin, 8-bit retro, etc. More precise, detailed, and correct small

Starting point is 00:11:57 image features. New upscalers with much better image and texture quality, roughly 25% faster for standard image jobs, improved text accuracy, which is obviously a big one. This is an area where for some time Dolly 3 has been out a little bit ahead of mid-jurney, so it'll be interesting to see where they catch up, and they say things should look, quote, generally more beautiful across the board. Now, interestingly, they say that even though it took them around six months to get this version 6.1 model out, they think they'll be releasing a version 6.2 in the next month or so. There are also some features that are not available yet in version 6.1, specifically in painting and out painting, so if you're using their zoom, reframe, repaint, or vary region features, it falls back automatically to the

Starting point is 00:12:35 version 6.0 model. Initial reflections for the community validate a lot of what Mid Journey had to say. Nick St. Pierre writes, version 6.1 upscalers are so much better. Wow. Tatiana Sigeliva did a comparison of a macro close-up shot of an eye, and while version 6 is great, version 6.1 feels nearly indistinguishable from photography, although I'm sure that's what we'll say again when we see version 6.2 and it's even better. Other people are already putting this into. a new pipeline. Everett World, for example, took advantage of runways newly launched Gen 3 Image to Video features, with images provided by Mid Journey v6.1. Others are using Mid Journey Plus Luma for a similar pipeline, and overall, the capacity for people to create with AI is just

Starting point is 00:13:17 increasing by the day. Now, if you are interested in a deeper analysis of how Mid Journey 6 compares to Mid Journey 6.1, we will be posting something like that on Superintelligent really soon. We're also in the final days of a summer promotion where we're offering an annual subscription to Super Intelligent for 50% off the already discounted rate, so just $96 for the year. If you sign up at BSUPER.AI, you can use code year 50 to get that. In any case, I now am going to wrap this up so I can go play around with these tools. Appreciate you listening or watching as always. And until next time, peace.

The AI Daily Brief: Artificial Intelligence News and Analysis - ChatGPT Launches Voice, Midjourney Launches 6.1

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.