The Changelog: Software Development, Open Source - Dataset wars, Bark, Kent Beck needs to recalibrate, StableLM & blind prompting is not prompt engineering (News)
Episode Date: April 24, 2023The dataset wars are heating up, Bark is a transformer-based text-to-audio model that can generate highly realistic, multilingual speech as well as other audio, Kent Beck needs to recalibrate after us...ing ChatGPT, the team behind Stable Diffusion release a new open source language model & Mitchel Hashimoto weighs in on prompt engineering.
Transcript
Discussion (0)
What up nerds, I'm Jared and this is ChangeLog News for the week of Monday, April 24th, 2023.
One of my favorite things about our new email format is we no longer proxy links through email.changelog.com.
That's awesome for two reasons.
First, privacy. We have no idea which links
we're clicking on. And two, user experience. You can hover on a link to see where you're headed
first. I do that all the time. If you appreciate direct links as much as I do, pop in your email
address at changelog.com slash news. And if you've already done that, please tell a friend about the show. Okay, let's get into the news. The dataset wars are heating up. The New York Times reports that
Reddit will begin charging for access to its API. They appear to be following Twitter's playbook
here only with much better tactics because they won't be charging small time researchers or indie
bot and app developers.
It's companies like Google and OpenAI who want the data to power their machine learning projects who will have to pony up. Stack Overflow is also getting in on that action. I expect this will
become increasingly common, as increasingly AI-focused product offerings require increasingly
large, diverse, and high-quality data sources.
At the same time, companies who are well-positioned to provide said data sources experience increasingly
less advertising revenue.
It actually feels like a better business model for the Reddits, Stack Overflows, and Pinterest
of the world.
And as an end-user of these systems, for some reason I feel better
about trading in my data to be compressed alongside millions of others and synthesized
by an AI than I do having it used to profile me for personalized ads. Not that they won't do both,
but still. Are you with me on that sentiment, or am I out on a limb here? Let me know in the comments. The team at Suno AI is helping
change the game in text-to-speech realism by releasing Bark, a transformer-based text-to-audio
model that can generate highly realistic multilingual speech as well as other audio,
including music, background noise, and simple sound effects. It can also laugh, sigh, cry, and make other non-word sounds that people make.
Crazy, right?
Here's an example that includes sad and sighs meta tags.
My friend's bakery burned down last night.
Now his business is toast.
And here's one more with laughter.
I don't like Pytorch, Kubernetes, or Schnitzel.
And xylophones flummox me.
You can still hear some digital artifacts and blips here and there,
but we're getting closer to synthesized audio
that's indistinguishable from the real thing.
And that's cool slash scary.
In a tweet that went viral last week, extreme programming creator Kent Beck proclaimed,
quote, I've been reluctant to try ChatGPT. Today, I got over that reluctance. Now I understand why
I was reluctant. The value of 90% of my have, or are about to. The rules of engagement
are changing in the software world. It's time to embrace, adapt, or watch your skills become He expands on that tweet in a full-on blog post where he tells the story of his aha moment.
Oh, and if you're hoping for a scientific explanation of that 90-10 split
and which remaining skills got the 1000x boost, don't get your hopes up.
Kent says he was just
extrapolating wildly from a couple of his experiences, which is what he does. Stability AI,
the team behind Stable Diffusion, released a new open source language model they're calling
Stable LM. It's currently available in 3 billion and 7 billion parameters, with 15 billion to 65 billion parameters coming soon.
This model is usable and adaptable for both commercial and research purposes.
They're also releasing a set of researchpaca, GPT for All, Dolly, ShareGPT, and HH.
Some of these we've covered on the pod, others I haven't even heard of.
I love how much the open-ish AI advancements build and feed off one another, because the rising tide lifts all boats.
It is now time for some sponsored news.
Thanks, Sentry!
Instead of spending time writing tests with little to no visibility if the tests actually
give you meaningful coverage in a given change, using Sentry's integration with CodeCov lets
you see the untested code causing errors directly in the Sentry issue stack trace,
which means no more time wasted trying to analyze your code base to find out where you need test coverage.
Here's what Alex Nathaniel, the director of technology at Vecter, has to say about it.
Quote,
With the Sentry and CodeCov integration, I no longer have to analyze our code base
and spend cycles thinking about where we need test coverage.
Instead, Sentry just tells me exactly where I need to focus,
saving me several weeks out of my year and reducing my time spent on building test coverage by nearly 50%.
Check the link in your show notes and chapter data to learn more about using Sentry with CodeCov and how to get all set
up. Thanks again to Sentry for sponsoring Changelog News. Mitchell Hashimoto weighs in on prompt
engineering in a long, detailed article titled Prompt Engineering vs. Blind Prompting. His
premise, quote, a lot of people who claim to be doing prompt engineering today are actually just blind prompting.
Blind prompting is a term I am using to describe the method of creating prompts with a crude trial and error approach paired with minimal or no testing and a very surface level knowledge of prompting.
Blind prompting is not prompt engineering.
End quote.
I feel so seen.
Mitchell goes on to make the argument that prompt engineering is a real skill that can be developed based on real experimental methodologies.
He displays this with a realistic example where he walks through the process of prompt engineering a solution to a problem that provides practical value to an application. If the way you've been using these new AI tools is aptly described by Mitchell as blind prompting,
definitely give this one a read.
That's the news for now.
I'll finish out this episode with some shout-outs
to our newest Changelog++ supporters.
Thank you to Jordan, Brian, Willoja, Liam, Jack, Richard,
Aaron, Emmanuel, David, John, Richard, Matthew, Joe, Max, Carl, and Anthony.
If you've never heard of Changelog++, check it out.
It's our membership program where you can directly support our work, make the ads disappear, and get in on cool bonuses like extended episodes and shoutouts from me on this very program.
On this week's changelog interview
episode, Adam sits down with Andrew Klein from Backblaze. They're chatting about hard drive
reliability and how they manage more than 250,000 hard drives. Have a great week. Share changelog
with your friends if you dig it, and we'll talk to you again real soon.