Tech Brew Ride Home - Tue. 01/10 – Now We Have VALL-E To Take Over My Podcast Voice

Starting point is 00:00:00 On April 4th, 2023, around 2 in the morning, a man was found stabbed multiple times on a sidewalk in downtown San Francisco. Hey, who did this to you? What happened next turned the story into a political firestorm. Reports have identified the victim as Bob Lee, the founder of Cash App. From Bloomberg Podcasts, this is Foundering, the Killing of Bob Lee, beginning April 16. Welcome to the TechMeme right home for Tuesday, January 10th, 2023. I'm Brian McCullough today. Well, now there's VAL-E, a text-to-speech technology that could fully replace me as this podcast narrator. It looks like Microsoft wants to do everything just short of buying OpenAI entirely. More layoffs at Coinbase. Why the whole

Starting point is 00:00:54 5G interfering with airplanes thing still isn't resolved and not everything that says it's chat GPT is really chat GPT. Here's what you missed today in the world of tech. Well, it seems as though, once again, my instinct to investigate deeper into a topic was perfectly timed. Microsoft has unveiled Vol E, a text-to-speech AI model trained on 60,000 hours of English speech that can simulate a person's voice from just three seconds of sample audio. Quoting ours, Technica. Once it learns a specific voice, Volley can synthesize audio of that person saying anything, and do it in a way that attempts to preserve the speaker's emotional tone. Its creators speculate that Volley could be used for high-quality text-to-speech applications, speech editing where a recording of a person could be

Starting point is 00:01:49 edited, and changed from a text transcript, making them say something they originally didn't, and audio content creation when combined with other generative AI models like GPT3. Microsoft calls Vol E a neural codec language model, and it builds off of a technology called N-Codec, which meta-announced in October 22. Unlike other text-to-speech methods that typically synthesize speech by manipulating waveforms, Vali generates discrete audio-codic codes from text. and acoustic prompts.

Starting point is 00:02:27 It basically analyzes how a person sounds, breaks that information into discrete components, called tokens, thanks to Enkodic, and uses training data to match what it knows about how that voice would sound if it spoke other phrases outside of the three-second sample. Microsoft trained Voli's speech synthesis capabilities on an audio library,

Starting point is 00:02:52 assembled by meta, called library light, It contains 60,000 hours of English language speech from more than 7,000 speakers, mostly pulled from library vox public domain audiobooks. For Volley to generate a good result, the voice in the three-second sample must closely match a voice in the training data. On the Volley example website, Microsoft provides dozens of audio examples of the AI model in action. Among the samples, the speaker prompt

Starting point is 00:03:25 is the three-second audio provided to Volley that it must initiate? The grand truth is a pre-existing recording of that same speaker saying a particular phrase for comparison purposes, sort of like the control in the experiment. The baseline is an example of synthesis provided by a conventional text-to-speech synthesis method, and the Volley sample is the output from the Volley model. While using Volley to generate those results, the researchers only fed the three-second speaker-prompt sample and a text string, what they wanted the voice to say, into Volley.

Starting point is 00:04:04 So compare the Ground Truth sample to the Volley sample. In some cases, the two samples are very close. Some Volley results seem computer generated, but others could potentially be mistaken for a human speech, which is the goal of the model. In addition to preserving a speaker's vocal timbre and emotional tone, Volley can also imitate the acoustic environment of the sample audio. For example, if the sample came from a telephone call,

Starting point is 00:04:34 the audio output will simulate the acoustic and frequency properties of a telephone call in its synthesized output. That's a fancy way of saying it will sound like a telephone call too. And Microsoft samples. in the synthesis of diversity section, demonstrate that Vollee can generate variations in voice tone by changing the random seed used in the generation process. Perhaps owing to Vol-E's ability to potentially fuel mischief and deception,

Starting point is 00:05:05 Microsoft has not provided Volley code for others to experiment with, so we could not test VollE's capabilities. End quote. Hey, anyone at Microsoft, Brian here. If you're listening, I think we've always had a good relationship. I've interviewed both Kevin Scott and Brad Smith in the past. I've been told the C-suite at Microsoft listens to this show regularly. So get in touch. Let's do this. Help me train my voice on Volley and use it on this podcast and we'll all have a very public demonstration on how this technology works. You know how to get in touch with me, Brian at Techmeme.com. headline, or even the only OpenAI and Microsoft headline today, because sources are also telling Semafor that Microsoft is in talks to invest $10 billion in Open AI at a $29 billion valuation. I don't know if I can get this voice to do emphasis, so this is me emphasizing.

Starting point is 00:06:13 They want to invest $10 billion. They want to all but own the company. They do this by taking 7 to 5% of Open AI profits until the investment. investment is recouped and then owning 49% of the company thereafter, quoting some offer. The funding, which would also include other venture firms, would value OpenAI, the firm behind CHATGPT. At $29 billion, including the new investment, the people said, it's unclear if the deal has been finalized, but documents sent to prospective investors in recent weeks outlining its terms indicated a targeted close by the end of 2022. Microsoft's infusion would be part of a complicated

Starting point is 00:06:56 deal in which the company would get 7 to 5% of OpenAI profits until it recoups its investment, the people said. It's not clear whether money that OpenAI spends on Microsoft's cloud computing arm would count toward evening its account. After that threshold is reached, it would revert to a structure that reflects ownership of OpenAIDI, with Microsoft having a 49% percent. stake. Other investors taking another 49% and OpenAI nonprofit parent, dating 2%. There's also a profit cap that varies for each set of investors. Unusual for venture deals, which investors hope might return 20 or 30 times their money. The terms and the investment amount could change, and the deal could fall apart. Microsoft and OpenAI declined to comment. The Wall Street Journal

Starting point is 00:07:45 reported last week that ChatGPT was allowing employees. employees and early investors to sell their shares at a valuation of $29 billion. The information reported in October that Microsoft, which had invested $1 billion in cash and cloud credits into Open AI in 2019, was in talks to increase its stake. End quote. Amazon plans to launch buy with prime for all U.S. base merchants, starting on January 31st, letting them access the usual prime benefits, expanding on the pre-existing fulfillment by Amazon program, quoting TechCrunch. The service which allows third-party merchants to offer prime benefits like free shipping and returns on their own apps was initially

Starting point is 00:08:34 only available to those merchants who were already using fulfillment by Amazon or FBA to handle their shipping and logistics. The service was first introduced in spring of 2022 with FBA merchants and other select merchants on an invite-only basis. With Buy with Prime, consumers get fast-free delivery, similar to Amazon.com's Prime service, plus seamless checkout and easier returns, allowing merchants to establish their own direct relationships with customers, Amazon says. Since its April debut, Amazon claims the offering has increased shopper conversion by an average of 25%. It notes that it measured Buy With Prime's success by comparing conversions on the sites where Buy with Prime was offered as a purchase option to those where it was not during the same time period.

Starting point is 00:09:19 In a press release, Wise, confirmed it was seeing a 25% higher conversion rate on Buy with Prime and noted it has added the option to all items in its catalog. Meanwhile, skincare brand Trophy Skin said the option to check out using Buy With Prime had resulted in a conversion rate increase of over 30%. An electrolyte drink mix brand hydrolyte, meanwhile, reported a 14% increase in conversion, end quote. Coinbase this morning announced plans to lay off a further 950 or so employees or around 20% of its existing workforce seeking to reduce its operating expenses by 25% quarter over quarter. Quoting CNBC, Coinbase, which had roughly 4,700 employees as of the end of September,

Starting point is 00:10:08 already slashed 18% of its workforce in June, citing a need to manage costs and growing too quickly during the bull market. Quote, with perfect hindsight, looking back, we should have done more. CEO Brian Armstrong told CNBC in a phone interview, The best you can do is react quickly once information becomes available, and that's what we're doing in this case, end quote. Coinbase said the move would result in new expenses of between $149 million and $163 million for the first quarter. The layoffs, along with other restructuring measures, will bring Coinbase's operating expenses down by 25% for the quarter ending in March, according to a new regulatory filing. The crypto company also said it expects adjusted EBITDA losses for the, the full year to be within a prior $500 million guardrail set last year. After looking at various

Starting point is 00:10:54 stress tests for Coinbase's annual revenue, Armstrong said, quote, it became clear that we would need to reduce expenses to increase our chances of doing well in every scenario, and there was no way to do so without reducing headcount. The company will also be shutting down several projects with a, quote, lower probability of success, end quote. Cryptocurrency markets have been rocked in recent months following the collapse of one of the industry's biggest players, FTX. Armstrong pointed to that fallout and increasing pressure on the sector thanks to, quote, unscrupulous actors in the industry, referring to FDX and its founder Sam Bankman-Fried. The FTCLapse and the resulting contagion has created a black eye for the industry, he said, adding there is likely more shoes to drop. We may not have seen the last of it.

Starting point is 00:11:36 There will be increased scrutiny on various companies in the space to make sure that they're following the rules, Armstrong said. Long term, that's a good thing, but short term, there's still a lot of market fear, end quote. Follow up to a story we talked a lot about, what was it, a year ago, maybe more than that. The Federal Aviation Administration is proposing a February 1st, 2024 deadline for airlines to replace or retrofit altimeters that can't filter out 5G signals, preventing full C-band deployment. Quoting Ars Technica, out of the 7,993 airplanes on the U.S. registry, the FAA said it estimates that approximately 180 airplanes would require radio altimeter replacement, and 820 airplanes would require the addition of radio altimeter filters to comply with the proposed

Starting point is 00:12:28 modification requirement, end quote. The total estimated cost of compliance is $26 million. The requirement could finally end a dispute between the aviation and wireless industries, which has prevented AT&T and Verizon, from fully deploying 5G on the C-band spectrum licenses the wireless carriers purchased for a combined $69 billion. Airplane altimeters rely on a spectrum from 4.2,000, gigahertz to 4.4 gigahertz, but some cannot filter out 5G transmissions from the carrier's spectrum in the 3.7 to 3.98 gigahertz range. Some radio altimeters may already demonstrate tolerance to the 5G C-band emissions without modification, the FAAA said. Some may need to install filters between the radio altimeter and antenna to increase a radio altimeter's tolerance. For others, the addition of a

Starting point is 00:13:16 filter will not be sufficient to address interference susceptibility. Therefore, the radio altimeter will need to be replaced with an upgraded radio altimeter, end quote. The FAA said it expects erroneous system warnings due to a malfunctioning radio altimeter to lead to flight crew becoming desensitized to system warnings, such desensitization, negates the safety benefits of the warning itself and can lead to a catastrophic event, end quote. The FAA had said in June of 2022 that airlines must replace or retrofit faulty altimeters as soon as possible. But the notice issued today said, February 1, 2024, quote, is the date the FAA has determined to be as soon as reasonably

Starting point is 00:13:55 practical, consistent with FAA policy, end quote. The FAA will take public comment on its new proposal for 30 days before finalizing it. A Bloomberg report quoted Lobby Group Airlines for America as saying that airlines, quote, are working diligently to ensure fleets are equipped with compliant radio altimeters, but global supply chains continue to lag behind current demand. Any government deadline must consider this reality, end quote. Finally today, back to our new AI overlords. With any hype cycle, you tend to find grifters, right? Allegedly.

Starting point is 00:14:34 Well, in this case, some of our AI overlords may not be exactly what they claim to be. Mac Rumors says that a sketchy iOS app called ChatGPT-ChapT-AI with GPT3, which is not affiliated with OpenAI at all, is selling subscriptions and has been a top-paid app store app for days. Quote, Chat-GPT is free to use on the web for anyone with an OpenAI account, but it has inspired scammers and sketchy developers to take advantage of its popularity for ill-gained profit. One app in particular, named Chat-G-P-T-T-AI with GPT-3, gives the impression it is the official app for the chat-G-T bot, but appears to have no affiliation to open AI, the creators of Chat-GPT, or the bot itself. The app charges users a $7.99 sent weekly or

Starting point is 00:15:20 49-99 annual subscription to use the bot unlimited times and eliminate intrusive in-app. ads. The app and its bot are inconsistent, sometimes providing generic or entirely irrelevant responses to a prompt offered by the user. The app is currently the second most popular productivity app on the app store in the United States, indicating it is rather popular. The app has nearly 12,000 ratings with a number of positive and negative reviews. This is a fake app, one review said. This is just faking open-eye endorsement and more bad stuff, another user said. Despite its suspicious activity, presence, and soaring popularity, the app has passed Apple's App Store review process multiple times since its initial launch three weeks ago. The developers behind the app,

Starting point is 00:16:04 named social media apps and game sports health-run hiking, ruining fitness tracking, have other sketchy apps on the platform, including an activity lock screen widget 16 app and better track ride hike-run swim app. As Austin already said on Twitter, quote, The iOS App Store is full of folks putting Chat GPT into a paid wrapper with ambiguous language that would let you believe you're paying for Chat GPT. Learned about this from my dad saying, Chat GPT seems really cool, but I hate that I have to pay for it to try it out. Wonder how much money people are making wrapping free open source software in a page U.S.

Starting point is 00:16:40 End quote. So weirdly, when I switched between those last two voices, I had to change the prompting. The male voice couldn't pronounce iOS or chat GPT properly. I had to spell it out. But I had to do no such thing for the female voice. The female voice could pronounce both normally. Both voices had problems with the $7.99 thing.

Starting point is 00:17:10 I had to write that out too, or else they both said it as dollar sign 7.99. Now, here's a wild thought. What if every time I quoted people's tweets or even every time I, you know, quoted people directly on the show. I used a different voice. I could do that. I wouldn't have to say quote end quote all the time, because it would be obvious, right? Anyway, the voice for the first segment was labeled as Scottish English, which was cool. The second segment was labeled as newscaster, and I, of course, forgot to link to those two YouTube video experiments I did in the show notes yesterday, so hopefully I've remembered to put those at the end of the show notes today. The voices

Starting point is 00:17:50 you've just heard were generated using text I wrote this morning, but the two videos I mentioned were generated entirely by using prompts on chat GPT. I didn't write a single word. Again, more on all this soon when Chris and I do another Twitter space. Talk to you tomorrow.

Tech Brew Ride Home - Tue. 01/10 – Now We Have VALL-E To Take Over My Podcast Voice

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.