The Economics of Everyday Things - 90. Closed Captions
Episode Date: April 28, 2025It takes a highly skilled stenographer — and some specialized equipment — to transcribe TV dialogue in real time at 300 words per minute. Will A.I. rewrite the script? Zachary Crockett tries to ke...ep up. SOURCES:Doug Karlovits, general manager at Verbit.Katie Ryan, live steno captioner at Verbit. RESOURCES:"The Long Case for Machine Shorthand," by Sam Corbin (New York Times, 2024)."Caption This: Why Subtitling Is Big Business Amid the Content Boom," by Kirsten Chuba (The Hollywood Reporter, 2023)."Everyone Watches TV with Subtitles Now. How’d That Happen?" by Wilson Chapman (IndieWire, 2023)."When is Captioning Required?" (National Association of the Deaf).
Transcript
Discussion (0)
Katie Ryan's home office in Pittsburgh, Pennsylvania is pretty run-of-the-mill.
I just have a regular Ikea desk.
I have a big TV up on the wall.
I have a laptop stand with my laptop on it, and then I have a monitor stand that has two
monitors on it.
There's a blanket on the floor for my dog, you know. But the work she does at this desk is seen by millions of people every week.
I've done the Super Bowl a handful of times. I've done the Olympics many times. I just
did the Oscars a couple weeks ago. Any sporting event that you can think of, I've probably
done it. Any major news event that has happened, I have probably been involved in that somehow.
Presidential funerals, presidential debates.
I remember when the Boston Marathon bombing happened, the breaking news was just constant.
I think I was on the air writing without a commercial break for something like three
and a half hours.
S1 C1
Ryan is a captioner.
She writes the text transcripts that appear on your TV screen when you turn on closed
captioning.
She does this in real time.
Most people think their TV just does it.
They don't realize that there's a person like me sitting in a room with headphones on.
And people don't realize that it's happening live.
Like if I'm writing a news broadcast or a sporting event. headphones on and people don't realize that it's happening live.
Like if I'm writing a news broadcast or a sporting event, maybe I have like five seconds
extra than you do when you're hearing it.
And I have to write it at the same time and try and keep up with all the speedy talkers
that are out there.
In some ways, it's a good business to be in.
One survey found that 50% of Americans and 70% of
Gen Z viewers say they watch content with captions on most of the time. But
the industry is also rapidly changing. The nimble fingers of human captioners
like Katie Ryan are up against the neural networks of artificial intelligence services.
Technology is the key to the future of captioning,
but you know, you need people that are looking at the content.
For the Freakonomics Radio Network, this is the economics of everyday things.
I'm Zachary Kroett. Today, closed captions.
The term captions is often used interchangeably with subtitles,
but the two are different.
Subtitles are used for translation.
Captions are designed for people with hearing impairments,
and they describe every auditory element—
dialogue, sound effects, music,
and sometimes even background noises.
The goal of captioning is to give the user
the content of exactly what's being heard.
That's Doug Karlovitz.
He's a general manager at Verbit,
the largest provider of captions in America.
He says that if you're watching something on TV, either live or pre-recorded, you can
almost always turn on the captions in the device's settings.
But that wasn't always an option.
Really captions were born for television in 1970. The first pre-recorded show ever captioned was The French Chef with Julia Chow.
The earliest efforts were called open captions, and they were limited to pre-recorded shows.
The text was a permanent part of the video.
Eventually, a new method called closed captions made it possible for
viewers to turn the text on and off. And by the 1980s, thanks to the efforts of the nonprofit
National Captioning Institute, captions could also be used for live television. Around this
time, Karlovitz's father Joe saw an opportunity to expand the captioning industry. My father was a court reporter, a stenographer,
and he became very interested in computers
and how to take his stenotype and get it translated
through a computer into English.
Stenographers are extremely fast typists.
On stenotype machines, they can transcribe up to 300 words per minute.
Joe began training fellow stenographers to do TV captioning, and in 1986, he founded
a company called VITAC, which was later acquired by another company called Verbit.
We started out with a local television station in Pittsburgh and eventually grew into the largest provider in North America of captioning.
Today, broadcasters, cable companies, and satellite services are required by federal laws to have captions available for nearly every televised program. This also carries over to much of the media on streaming services online, and most video
content in public settings, like courtrooms, hospitals, schools, and sports bars.
Captions have to be readable, accurate, and inclusive of all audio context.
They have to clearly identify each speaker, and for live broadcasts, like news programs,
they appear almost in
real time.
In the United States, everything that airs on television should have captions today.
Almost every show has captions on it.
VITAC is one of three companies alongside IBM and Zoo Digital Group that control around
60% of the captioning market.
Karlovitz says they caption around 500,000 hours of content a year.
We work with all the major broadcasters, all the various producers of television programs.
Work with all the different universities around the world, providing captions for the classroom.
On the legal side, we're working with law firms and court reporting agencies.
And on the government side,
we'll do anything from town halls to training on all the different things.
We also work with sports venues, theaters.
So everywhere where words are spoken,
there's the opportunity to add captions.
Much of today's captioning has shifted
from human stenographers to automated tools.
In some cases, the captioning service uses a technique
called re-speaking.
A human employee watches a show in a recording booth
and carefully recites every word into a special
microphone. Voice-to-text software turns the narration into a written transcript. In other
cases, particularly with pre-recorded TV shows, technology can be used to generate text from
a script. But for live TV, like news broadcasts, Super Bowls, and presidential debates, a human captioner clacking
away at a machine is still the most reliable option.
A stenographer gets a live feed of a network's audio a few seconds before it goes to the
general public.
They listen through a pair of headphones while typing out the words in shorthand on their
stenotype machine.
This shorthand goes through processing software
on a computer that turns it into text.
The text is embedded in a video signal
that's transmitted to the television network
through modems and IP connections.
And when you press the closed captions button
on your remote, a microchip inside your TV
retrieves and displays the captions on screen.
It's a complex process, and networks might pay Verbit anywhere from $130 to $175 per
hour for live human captioning services.
So if you have a broadcast show that's in a 30-minute block, but it may be really only
on the air for 24 minutes, they would pay for that on a 30-minute block, but it may be really only on the air for 24 minutes.
They would pay for that on a per-minute basis.
If you're doing a live show, you're paying basically for the times that are booked, because
you don't know how long those live shows can go.
So who are these humans who create the captions on TV?
And what's it like to be on the clock during a live broadcast?
Sometimes you can't even get a drink of water.
That's coming up.
Katie Ryan didn't start out hoping to be a professional captioner.
When I was graduating high school, I really didn't know what I wanted to do with my life.
And my great aunt Sandy, her sister at the time,
was an official court reporter in Philadelphia.
And Sandy said, well, you can type fast on a keyboard.
Why don't you look into stenography?
Ryan completed a court reporting program
at a community college in Pittsburgh
and joined VITAC, now Verbit, after graduating.
She's been at the company as a captioner for more than two decades.
In her work, Ryan uses a machine called a Stenotype. It has a small screen and around
20 unmarked keys that look kind of like popsicle sticks. She's able to type at speeds of up to
300 strokes per minute using a technique called
cording. She presses down on multiple keys simultaneously to phonetically spell out whole
syllables, words, and phrases with one motion.
Stenography is essentially learning another language. It's combinations of keys to make
It's combinations of keys to make words. And so on the machine, each key has a letter,
and then there are combinations of keys
that make more letters.
P, B would be N.
The letter I would be E, U.
The letter D would be T, K.
And then there are combinations of keys that make words.
So and would be A, P, B, D.
Your hands are on different sides of the keyboard
on the machine.
Your left hand is prefixes, your right hand is suffixes.
And then you have your endings,
I-N-G-S-E-D on your right side.
Brian can spell out entire phrases
with just a few keystrokes.
A good example would be like ladies and gentlemen.
That would be good for TV or court.
On my machine, it would be L-A-I-R-J.
So you hit all of those keys at once and ladies and gentlemen will come out in your computer
software.
In one fell swoop.
In one stroke, you get all of those words.
Before she goes live,
Ryan creates a dictionary full of customized briefs,
abbreviations of specific words
that she knows will reoccur throughout the broadcast.
For the Academy Awards,
she'll program combinations of keystrokes
for the title of each nominated movie.
For a hockey game, she'll program every player's name.
Instead of having to write out their name every single time that it's said, you hit that one
combination of keys one time or twice and then that whole name will come out.
Obviously, we have to search ahead of time to find out who like your play-by-play announcer is and who your color analyst is. But the process doesn't usually go without a hitch or two.
Captioners are human, after all,
and they make the occasional mistake.
While there's no federally mandated benchmark,
the standard for accuracy in the industry is 99%,
meaning one out of every 100 words
might be misspelled or altogether butchered.
Oftentimes, a captioner is aware of a typo.
They just don't have the time to fix it during a high-speed live broadcast.
We have the asterisk on my machine, which is the key in the very middle,
that can erase a mistake.
But nine times out of ten, you are not going to catch it fast enough
before it already goes out on the air.
And then if you try and take it back,
it's just gonna garble the captions up.
So it's better to just, if you make a mistake,
just ignore it and keep writing and move past it.
And then the faster it moves off the screen,
the faster people will forget about it.
Even after 21 years in the job,
Ryan has a few recurring issues.
I tend to drag my fingers.
So sometimes I will catch extra letters when I'm trying to
write certain words or I'll miss keys too.
Like if my fingernails are too long sometimes, I can't quite hit the keys right.
Sometimes you might notice the captions pause for a few moments or go blank.
This is likely because the captioner fell off pace and is trying to catch up.
This happens most often with news shows where the banter can be lightning fast.
Rachel Maddow, who hosts her own live show on MSNBC, has been clocked talking at up to 270 words per minute. A challenge for even the most seasoned captioner.
If you need to just let a sentence go and then catch up again, that's okay.
When you start paraphrasing though, then you take the risk of presenting the wrong information
or turning it into something that they didn't actually say.
And that's the last thing you want to do.
You don't want to put words in anybody's mouth.
The goal is to provide a text equivalent of as much of the audio as possible.
This can be particularly challenging when multiple people are speaking at once.
A lot of times it'll just be, you know, a couple of words in a dash,
and then the next person will be a couple of words in a dash.
Sometimes there's nothing you can do. If they're just screaming at each other,
there is nothing you can do, you know.
Once they figure it out, then you can keep going again.
Doug Karlovitz, the general manager at Verbit, says certain TV shows pose more problems than
others.
Like The Osborns, a reality show from the early 2000s that followed the aging and often
incomprehensible rock star Ozzy Osborn and his family.
The debates around the office on what we thought he was saying on that show was good watercooler
conversations.
Well, first was, is he just putting this on?
Eventually as that show got renewed, you realize, no, that's how Ozzy talks.
It was really like, I think he said this. And then,
you know, people would go and come over, listen to this. What do you think he said? And, you know,
you would just sit there and I don't know. I don't know what he said. I don't think he knows what he
was saying. There are also elements that require interpretation, like how to caption a noise or a
nonverbal vocalization. Some networks and studios are particular.
Disney reportedly has specific rules
about how R2-D2's mechanical noises should be captioned.
Netflix is fond of using the phrase wet swelching
to describe the sound of monsters
in the show Stranger Things.
For background noises and live captioning,
Ryan uses a list of templatized descriptions.
We call them parentheticals, so like bells tolling or applause, singing, chanting, things like that.
You want to try and be descriptive, but also you don't want to go overboard.
All of this effort is to ensure that people who are deaf or hard of hearing have equal access to media,
but captions have found a much broader audience. A 2022 survey by the language learning platform
Preply found that half of all viewers now watch media with captions on most of the time.
Some have speculated that's at least partly to do with modern sound mixing,
which alternates between loud sound effects
and quiet dialogue.
Game of Thrones, there was so much background noise occurring on that show that a lot of
the people started using captions.
But the most frequent users of captions are now younger people, particularly Gen Z. And
that has more to do with changes in the media landscape.
The younger viewers, they're watching it on their phones.
They're watching it on their iPads.
They're not necessarily listening, but they're reading it
as they're in class or they're at work
and don't wanna call attention to themselves.
Some publishers have estimated that up to 85%
of the videos they post on Facebook are
watched on mute.
Many short-form videos on social media sites now have captions coded directly into the
media file that can't be turned on or off.
That's because it's keeping that person who's looking, it's keeping their attention longer.
Some platforms, like YouTube, offer their own tools to creators
that use speech recognition
to generate captions automatically.
Karlovitz says artificial intelligence
has already fundamentally changed
the captioning business.
Verbit offers automatic speech recognition
and generative AI tools
that are trained with diverse language models
to pick up on speech patterns.
Karlovitz says these options cost much less than traditional transcription,
but they still aren't as accurate or precise as a human captioner.
And at least for now, many clients still prefer their captions to be generated by a human being,
like Katie Ryan.
Maybe a deaf person is in an area that there's tornadoes, and they turn on their local news.
We want those people to be able to have captioning that is as accurate and as clean as possible,
so they know what to do and they can be safe.
I will always advocate for a human captioner to be there to give the best service possible.
When you watch TV, do you always use the captions?
No. Never have captions on in my house.
Really?
Never, no.
I sit in front of a computer and deal with that all day.
I don't need to worry about it.
I'm off the clock.
For the economics of everyday things, worry about it. I'm off the clock.
For the economics of everyday things, I'm Zachary Krakat.
This episode was produced by me and Sara Lilly and mixed by Jeremy Johnston.
We had help from Daniel Moritz-Rapson and thanks to our listeners Owen Roberts and David
Kennet for suggesting this topic.
If you have an idea for an episode, feel free to email us at everydaythings at Freakonomics.com.
Our inbox is always open.
All right, until next week.
What if you're in the middle of like a live broadcast and you just really have to pee? Now from my office to my bathroom is like ten steps, so I can make it.
The Freakonomics Radio Network.
The hidden side of everything.
Stitcher.