Tech Brew Ride Home - Tue. 01/10 – Now We Have VALL-E To Take Over My Podcast Voice
Episode Date: January 10, 2023Well, now there’s VALL-E, a text to speech technology that could fully replace me as this podcast narrator. It looks like Microsoft wants to do everything just short of buying OpenAI entirely. More ...layoffs at Coinbase. Why the whole 5G interfering with airplanes thing still isn’t resolved. And not everything that says it’s ChatGPT, is really ChatGPT. Sponsors: RefundsPro.com Links: Microsoft’s new AI can simulate anyone’s voice with 3 seconds of audio (ArsTechnica) Microsoft eyes $10 billion bet on ChatGPT (Semafor) Buy with Prime, which brings Prime to third-party sites, officially launches in U.S. on Jan. 31 (TechCrunch) Coinbase to slash 20% of workforce in second major round of job cuts (CNBC) FAA giving airlines another year to fix altimeters that can’t handle 5G signals (Ars Technica) Sketchy ChatGPT App Soars Up App Store Charts, Charges $7.99 Weekly Subscription (MacRumors) YouTube Experiment 1 YouTube Experiment 2 Learn more about your ad choices. Visit megaphone.fm/adchoices
Transcript
Discussion (0)
On April 4th, 2023, around 2 in the morning, a man was found stabbed multiple times on a sidewalk in downtown San Francisco.
Hey, who did this to you?
What happened next turned the story into a political firestorm.
Reports have identified the victim as Bob Lee, the founder of Cash App.
From Bloomberg Podcasts, this is Foundering, the Killing of Bob Lee, beginning April 16.
Welcome to the TechMeme right home for Tuesday, January 10th, 2023. I'm Brian McCullough today. Well, now there's
VAL-E, a text-to-speech technology that could fully replace me as this podcast narrator. It looks like Microsoft
wants to do everything just short of buying OpenAI entirely. More layoffs at Coinbase. Why the whole
5G interfering with airplanes thing still isn't resolved and not everything that says it's chat GPT is really
chat GPT. Here's what you missed today in the world of tech. Well, it seems as though, once again,
my instinct to investigate deeper into a topic was perfectly timed. Microsoft has unveiled
Vol E, a text-to-speech AI model trained on 60,000 hours of English speech that can simulate a
person's voice from just three seconds of sample audio. Quoting ours, Technica. Once it learns a specific
voice, Volley can synthesize audio of that person saying anything, and do it in a way that
attempts to preserve the speaker's emotional tone. Its creators speculate that Volley could be used
for high-quality text-to-speech applications, speech editing where a recording of a person could be
edited, and changed from a text transcript, making them say something they originally didn't,
and audio content creation when combined with other generative AI models like GPT3.
Microsoft calls Vol E a neural codec language model,
and it builds off of a technology called N-Codec,
which meta-announced in October 22.
Unlike other text-to-speech methods that typically synthesize speech by manipulating waveforms,
Vali generates discrete audio-codic codes from text.
and acoustic prompts.
It basically analyzes how a person sounds,
breaks that information into discrete components, called tokens,
thanks to Enkodic,
and uses training data to match what it knows
about how that voice would sound
if it spoke other phrases outside of the three-second sample.
Microsoft trained Voli's speech synthesis capabilities
on an audio library,
assembled by meta, called library light,
It contains 60,000 hours of English language speech from more than 7,000 speakers,
mostly pulled from library vox public domain audiobooks.
For Volley to generate a good result,
the voice in the three-second sample must closely match a voice in the training data.
On the Volley example website,
Microsoft provides dozens of audio examples of the AI model in action.
Among the samples, the speaker prompt
is the three-second audio provided to Volley that it must initiate?
The grand truth is a pre-existing recording of that same speaker saying a particular phrase for comparison purposes,
sort of like the control in the experiment.
The baseline is an example of synthesis provided by a conventional text-to-speech synthesis method,
and the Volley sample is the output from the Volley model.
While using Volley to generate those results,
the researchers only fed the three-second speaker-prompt sample and a text string,
what they wanted the voice to say, into Volley.
So compare the Ground Truth sample to the Volley sample.
In some cases, the two samples are very close.
Some Volley results seem computer generated,
but others could potentially be mistaken for a human speech,
which is the goal of the model.
In addition to preserving a speaker's vocal timbre and emotional tone,
Volley can also imitate the acoustic environment of the sample audio.
For example, if the sample came from a telephone call,
the audio output will simulate the acoustic and frequency properties of a telephone call
in its synthesized output.
That's a fancy way of saying it will sound like a telephone call too.
And Microsoft samples.
in the synthesis of diversity section,
demonstrate that Vollee can generate variations in voice tone
by changing the random seed used in the generation process.
Perhaps owing to Vol-E's ability to potentially fuel mischief and deception,
Microsoft has not provided Volley code for others to experiment with,
so we could not test VollE's capabilities.
End quote.
Hey, anyone at Microsoft, Brian here.
If you're listening, I think we've always had a good relationship. I've interviewed both Kevin Scott and Brad Smith in the past. I've been told the C-suite at Microsoft listens to this show regularly. So get in touch. Let's do this. Help me train my voice on Volley and use it on this podcast and we'll all have a very public demonstration on how this technology works. You know how to get in touch with me, Brian at Techmeme.com.
headline, or even the only OpenAI and Microsoft headline today, because sources are also
telling Semafor that Microsoft is in talks to invest $10 billion in Open AI at a $29 billion valuation.
I don't know if I can get this voice to do emphasis, so this is me emphasizing.
They want to invest $10 billion.
They want to all but own the company.
They do this by taking 7 to 5% of Open AI profits until the investment.
investment is recouped and then owning 49% of the company thereafter, quoting some offer.
The funding, which would also include other venture firms, would value OpenAI, the firm behind
CHATGPT. At $29 billion, including the new investment, the people said, it's unclear if the deal
has been finalized, but documents sent to prospective investors in recent weeks outlining its
terms indicated a targeted close by the end of 2022. Microsoft's infusion would be part of a complicated
deal in which the company would get 7 to 5% of OpenAI profits until it recoups its investment,
the people said. It's not clear whether money that OpenAI spends on Microsoft's cloud computing arm
would count toward evening its account. After that threshold is reached, it would revert to a structure
that reflects ownership of OpenAIDI, with Microsoft having a 49% percent.
stake. Other investors taking another 49% and OpenAI nonprofit parent, dating 2%. There's also a
profit cap that varies for each set of investors. Unusual for venture deals, which investors hope
might return 20 or 30 times their money. The terms and the investment amount could change,
and the deal could fall apart. Microsoft and OpenAI declined to comment. The Wall Street Journal
reported last week that ChatGPT was allowing employees.
employees and early investors to sell their shares at a valuation of $29 billion.
The information reported in October that Microsoft, which had invested $1 billion in cash and
cloud credits into Open AI in 2019, was in talks to increase its stake.
End quote. Amazon plans to launch buy with prime for all U.S. base merchants, starting on
January 31st, letting them access the usual prime benefits, expanding on the pre-existing
fulfillment by Amazon program, quoting TechCrunch. The service which allows third-party
merchants to offer prime benefits like free shipping and returns on their own apps was initially
only available to those merchants who were already using fulfillment by Amazon or FBA to handle
their shipping and logistics. The service was first introduced in spring of 2022 with FBA merchants and
other select merchants on an invite-only basis. With Buy with Prime, consumers get fast-free delivery,
similar to Amazon.com's Prime service, plus seamless checkout and easier returns, allowing merchants
to establish their own direct relationships with customers, Amazon says. Since its April debut,
Amazon claims the offering has increased shopper conversion by an average of 25%. It notes that it
measured Buy With Prime's success by comparing conversions on the sites where Buy with Prime was offered
as a purchase option to those where it was not during the same time period.
In a press release, Wise, confirmed it was seeing a 25% higher conversion rate on Buy with Prime
and noted it has added the option to all items in its catalog.
Meanwhile, skincare brand Trophy Skin said the option to check out using Buy With Prime
had resulted in a conversion rate increase of over 30%.
An electrolyte drink mix brand hydrolyte, meanwhile, reported a 14% increase in conversion, end quote.
Coinbase this morning announced plans to lay off a further 950 or so employees or around 20% of its
existing workforce seeking to reduce its operating expenses by 25% quarter over quarter.
Quoting CNBC, Coinbase, which had roughly 4,700 employees as of the end of September,
already slashed 18% of its workforce in June, citing a need to manage costs and growing too quickly
during the bull market. Quote, with perfect hindsight, looking back, we should have done more.
CEO Brian Armstrong told CNBC in a phone interview,
The best you can do is react quickly once information becomes available, and that's what we're doing in this case, end quote.
Coinbase said the move would result in new expenses of between $149 million and $163 million for the first quarter.
The layoffs, along with other restructuring measures, will bring Coinbase's operating expenses down by 25% for the quarter ending in March, according to a new regulatory filing.
The crypto company also said it expects adjusted EBITDA losses for the,
the full year to be within a prior $500 million guardrail set last year. After looking at various
stress tests for Coinbase's annual revenue, Armstrong said, quote, it became clear that we would
need to reduce expenses to increase our chances of doing well in every scenario, and there was
no way to do so without reducing headcount. The company will also be shutting down several projects
with a, quote, lower probability of success, end quote. Cryptocurrency markets have been rocked
in recent months following the collapse of one of the industry's biggest players, FTX.
Armstrong pointed to that fallout and increasing pressure on the sector thanks to, quote, unscrupulous actors in the industry, referring to FDX and its founder Sam Bankman-Fried.
The FTCLapse and the resulting contagion has created a black eye for the industry, he said, adding there is likely more shoes to drop.
We may not have seen the last of it.
There will be increased scrutiny on various companies in the space to make sure that they're following the rules, Armstrong said.
Long term, that's a good thing, but short term, there's still a lot of market fear, end quote.
Follow up to a story we talked a lot about, what was it, a year ago, maybe more than that.
The Federal Aviation Administration is proposing a February 1st, 2024 deadline for airlines to replace
or retrofit altimeters that can't filter out 5G signals, preventing full C-band deployment.
Quoting Ars Technica, out of the 7,993 airplanes on the U.S. registry, the FAA said it estimates that
approximately 180 airplanes would require radio altimeter replacement,
and 820 airplanes would require the addition of radio altimeter filters to comply with the proposed
modification requirement, end quote. The total estimated cost of compliance is $26 million. The requirement
could finally end a dispute between the aviation and wireless industries, which has prevented
AT&T and Verizon, from fully deploying 5G on the C-band spectrum licenses the wireless carriers purchased
for a combined $69 billion. Airplane altimeters rely on a spectrum from 4.2,000,
gigahertz to 4.4 gigahertz, but some cannot filter out 5G transmissions from the carrier's spectrum
in the 3.7 to 3.98 gigahertz range. Some radio altimeters may already demonstrate tolerance to the 5G
C-band emissions without modification, the FAAA said. Some may need to install filters between the
radio altimeter and antenna to increase a radio altimeter's tolerance. For others, the addition of a
filter will not be sufficient to address interference susceptibility. Therefore, the radio
altimeter will need to be replaced with an upgraded radio altimeter, end quote.
The FAA said it expects erroneous system warnings due to a malfunctioning radio altimeter
to lead to flight crew becoming desensitized to system warnings, such desensitization,
negates the safety benefits of the warning itself and can lead to a catastrophic event, end
quote. The FAA had said in June of 2022 that airlines must replace or retrofit faulty altimeters
as soon as possible. But the notice issued today said,
February 1, 2024, quote, is the date the FAA has determined to be as soon as reasonably
practical, consistent with FAA policy, end quote. The FAA will take public comment on its new
proposal for 30 days before finalizing it. A Bloomberg report quoted Lobby Group Airlines for
America as saying that airlines, quote, are working diligently to ensure fleets are equipped
with compliant radio altimeters, but global supply chains continue to lag behind current demand.
Any government deadline must consider this reality, end quote.
Finally today, back to our new AI overlords.
With any hype cycle, you tend to find grifters, right?
Allegedly.
Well, in this case, some of our AI overlords may not be exactly what they claim to be.
Mac Rumors says that a sketchy iOS app called ChatGPT-ChapT-AI with GPT3,
which is not affiliated with OpenAI at all, is selling subscriptions and has been a top-paid app store app for days.
Quote, Chat-GPT is free to use on the web for anyone with an OpenAI
account, but it has inspired scammers and sketchy developers to take advantage of its popularity
for ill-gained profit. One app in particular, named Chat-G-P-T-T-AI with GPT-3, gives the impression
it is the official app for the chat-G-T bot, but appears to have no affiliation to open AI,
the creators of Chat-GPT, or the bot itself. The app charges users a $7.99 sent weekly or
49-99 annual subscription to use the bot unlimited times and eliminate intrusive in-app.
ads. The app and its bot are inconsistent, sometimes providing generic or entirely irrelevant
responses to a prompt offered by the user. The app is currently the second most popular
productivity app on the app store in the United States, indicating it is rather popular. The app has
nearly 12,000 ratings with a number of positive and negative reviews. This is a fake app, one review
said. This is just faking open-eye endorsement and more bad stuff, another user said. Despite its
suspicious activity, presence, and soaring popularity, the app has passed Apple's App Store review
process multiple times since its initial launch three weeks ago. The developers behind the app,
named social media apps and game sports health-run hiking, ruining fitness tracking, have other
sketchy apps on the platform, including an activity lock screen widget 16 app and better track
ride hike-run swim app. As Austin already said on Twitter, quote,
The iOS App Store is full of folks putting Chat GPT into a paid wrapper with ambiguous language
that would let you believe you're paying for Chat GPT.
Learned about this from my dad saying,
Chat GPT seems really cool, but I hate that I have to pay for it to try it out.
Wonder how much money people are making wrapping free open source software in a page U.S.
End quote.
So weirdly, when I switched between those last two voices,
I had to change the prompting.
The male voice couldn't pronounce iOS or chat GPT properly.
I had to spell it out.
But I had to do no such thing for the female voice.
The female voice could pronounce both normally.
Both voices had problems with the $7.99 thing.
I had to write that out too, or else they both said it as dollar sign 7.99.
Now, here's a wild thought.
What if every time I quoted people's tweets or even every time I, you know,
quoted people directly on the show. I used a different voice. I could do that. I wouldn't have to
say quote end quote all the time, because it would be obvious, right? Anyway, the voice for the first
segment was labeled as Scottish English, which was cool. The second segment was labeled as
newscaster, and I, of course, forgot to link to those two YouTube video experiments I did in the show
notes yesterday, so hopefully I've remembered to put those at the end of the show notes today. The voices
you've just heard were generated using text I wrote this morning, but the two videos I mentioned
were generated entirely by using prompts on chat GPT. I didn't write a single word. Again, more on
all this soon when Chris and I do another Twitter space. Talk to you tomorrow.
