The Vergecast - The rise of the audio-only video game
Episode Date: May 12, 2024In episode two of our Five Senses of Gaming miniseries, David Pierce dives into the world of hearing with audio-only video games with Paul Bennun, who has been in this space longer than most. Years ag...o, Bennun and his team at Somethin’ Else made a series of games called Papa Sangre that were among the most innovative and most popular games of their kind. He explains what makes an audio game work, why the iPhone 4 was such a crucial technological achievement for these games, and more. Email us at vergecast@theverge.com or call us at 866-VERGE11, we love hearing from you. Learn more about your ad choices. Visit podcastchoices.com/adchoices
Transcript
Discussion (0)
Support for the show comes from Retool.
Too many companies run critical operations on duct taped spreadsheets,
Slack workflows, and whatever else they could cobble together.
Not because they want to, but because building internal tools
means weeks of waiting on someone else's backlog.
That's where Retool comes in.
Build custom internal tools just by describing what you need.
Prompts something like,
Build Me a Revenue Dashboard on our Salesforce data.
And Retool actually builds it on your company's data,
in your cloud with enterprise security built in.
Go to retool.com slash vergecast.
We all need to retool how we build software.
Welcome to the Vergecast, the flagship podcast of Floating Point Math.
I'm your friend David Pierce, and this is the second episode in our series,
sponsored by Visible Wireless, about the five senses of video games.
If you missed last week's episode, which was the Touch episode,
all about speed runners and the N64 joystick crisis of 2024, go back and listen.
It's a super fun episode.
This week's sense is hearing, sound.
And I want to start by telling you about a game I've been playing a lot over the last few weeks.
It's called Blind Drive.
Start.
What the hell?
Wait, hey!
Here's the story of Blind Drive, which is available for lots of platforms, but I've been playing it on my phone.
You're the main character.
You thought you were taking part in a scientific study, but suddenly you find yourself driving a car against traffic with a blindfold on.
You can't see anything.
The only way to survive is by hearing the cars, trucks, and cops and bikers and whatever else coming at you and then tapping on the phone screen to steer away from them.
The longer you make it, the higher your score, and the happier you make this scary sounding guy on the phone.
I've never played a game quite like this before.
It's really intense and really fun, and I find myself way more focused on it because it's all based on sound.
I'm on edge all the time playing it.
It's kind of a lot, honestly.
It's been fun to play this too because it feels so different than what I'm used to.
We live in such visual times right now.
We scroll vertical video.
We look at our TVs.
We put on VR headsets.
We play these huge, gorgeous open world games.
There's just always so much to look at all the time.
But there's something really powerful and different about audio.
I mean, you get it.
You listen to podcasts.
Podcasts are proof of how cool and powerful audio can be.
But I'd never really even thought about audio as more.
more than just a feature of video games.
It's a way to make explosions sound more real,
or at the very most, alert you to when someone is sneaking up behind you.
But there's actually a rich history of audio games,
video games without the pictures,
and some reasons to think those games might be coming back.
One of the people responsible for a big part of that history is this guy.
That's right.
As a baby, on a BBC Model B, because I'm that old,
and it was my first game.
is, you know, in quotes, my first video game.
Paul's route into audio games was a windy one that actually makes perfect sense in retrospect.
He was really into games as a kid.
He was writing code on a BBC Model B in the 80s by actually copying it out of a magazine.
That was how you wrote code back then.
Then Paul got into the music business and the broadcast radio business.
And then he actually ended up directing a video game version of the game show You Don't Know Jack.
It's time for the show where high culture and pop culture
All right, we got ourselves. One, two, three players on board the flight today.
Making that game, he told me, was mostly about making the audio.
All of the effort of that game sort of went into phenomenal writing, good game design,
and smoking mirrors that would make it seem like you were talking, that you, the player, were inside a live TV game show.
And doing that kind of reawakened my love for technology and creative technology.
And I've really been doing that ever since.
At this point, Paul had built lots of different kinds of games, but he had always thought it would be cool to make one that was all audio, for a couple of reasons, actually.
There was a broadcaster in the UK called Channel 4. They were originally sort of looking for games that were accessible.
The games that could be played by communities who were generally excluded from playing games.
And so I didn't want to make a game for people with a quote-unquote disability.
I wanted to make games that everyone could play and that everyone would love without any compromise.
Eventually, he decides to do just that.
Build an audio-only game.
At this point, it's about 2010, which is important because there was a big technological development that happened in 2010.
The iPhone 4 came out.
It had a bunch of new features like the retina display and the selfie camera, and it eventually was the one that went through all that antenna gate stuff.
Remember that?
The whole Steve Jobs, you're holding it wrong thing.
But for Paul, there was a significantly less publicized new feature that really, really, really mattered.
iPhone 4 was the first Apple's devices that had hardware acceleration for floating point math.
Now, why is that important?
It's important because we were doing real-time binaural synthesis on a mobile device.
So let's talk about binaural audio.
I have two ears.
Most people have two ears, but not everyone has two ears.
And you might think that, therefore, that we listen in stereo.
Well, we don't. The shape of your, the fleshy bit of your ear, the pinner, has got these folds in it, these lumps and folds.
You probably know the basics of this. Sound hits your ears at different times. Your brain does a huge amount of work to both make all of that sound make sense and also to intuit where that sound is coming from based on when it hits all those lumps and folds.
What it turns out that you can do is that you can get an artificial head with two good microphones where the ears would be and you can blade.
some white noise at that head from lots and lots of different directions. And you can work out
how the human head hears those sounds from different sources. And you can express that in terms
of a big table of numbers. Now what I can do is I can take away the white noise and I can apply
those numbers to an arbitrary bit of audio, an arbitrary audio file. And I can get the right number
from where I want this sound to sound like where it's coming from. I'm way oversimplifying here,
but basically once you can map sound the way Paul is describing, you can manipulate it.
Delay sound coming from a spot on the right just a tiny bit, and it'll sound like the thing over there
is further away. Increase sound in the front and decrease it behind, and boom, you have something
right smack in front of your face. This is a simple enough concept, but it's really, really tricky math.
And you're going to do that, by the way, 44,000 times a second, or 48,000 times a second,
because that's a typical sampling rate for digital audio.
So as you can see, that requires a lot of math.
Pre-Iphone 4, there wasn't a device that could do it.
And Apple helped us out a lot, by the way, with the Accelerate Framework, which came
in at that point, which enabled us to do this.
So it's actually pretty simple to, it's the same as in any video game, right?
Imagine there's, you've got an X and Y coordinates.
You've got a grid in front of you.
There's X and Y.
And as objects move relative to the player, you can change the X axis and you can change the
y-axis and you can calculate where the object is. So it's just the same in that regard between a
video game and an audio game. In a video game, you're just saying, well, you know, that the monster
has walked diagonally away from you so now I can change the x-axis and the y-axis position and I can
just render that object and, you know, I'm watching it on my screen, everyone's happy and you can do
exactly the same thing with audio games. It's just that instead of rendering it using whatever
technique you're using to render an object in video, you're just applying a bit of the
of math to the frequencies of the audio, and now all of a sudden the human being hears it.
It really hears it. The pictures are better on radio, as they say.
What he's describing, I should say, is super normal now. It's just spatial audio,
the idea that you can anchor something in place and then as you like turn your head or the object
itself moves, the sound moves with it. Lots of devices have this now. And you can use it to all
kinds of good and dumb effects. But at the time, it was revolutionary, especially when you also
had a mobile phone with you that could accurately track your movement in space.
We had to have an input scheme, an input control scheme in terms of changing the relative
position of the player and the audio that was incredibly precise. You could do it with a mouse,
but there were some drawbacks to that. It's far, far better if the human being themselves is
doing the turning, like the head is turning itself or the body is turning. And with Vision
Pro and with the current generation of spatial AirPods and so on, well, we're used to that now. We
understand that now. Back in the day, that didn't exist. So we kind of had to kind of create
the hardware effect that you get from spatialized headphones without the existence of specialized
headphones. And the only way that we could really see to do that was with the accelerometer
and the magnetometer inside iPhones. So with all this suddenly possible, Benin and his team
at a company called Something Else set out to build an audio-only game. They ended up calling it Papa
Saint-Gray.
or very soon you'll forget everything and everyone you've ever known.
The plot of Papa Sangre is pretty simple.
I'd feel bad spoiling it for you, but the game is 13 years old and you literally can't download it anymore.
So here goes.
You're dead.
You're stuck in Papa Sangre's kingdom, and the only way out is by completing a journey through five different palaces.
All the bad guys and monsters are everywhere, and they're trying to stop you.
All those bad guys and monsters respond.
to sound. So you have to move as quietly as you can while also looking for clues and talking
to characters and generally trying to figure out what's going on and where to go. Because you can't
see anything. There's nothing to see. The ice is thin. You need to be careful. Making a game
like this work is on one level not that complicated. You can sort of do anything you want in audio.
You can make audio pong just by using spatial audio to help you understand where the paddle and ball are.
You can make audio Tetris, Paul also says, which seems wild.
You can kind of make audio anything.
In the second version of Papa Sangre, there was even a duck hunt-style shooting game
where you just followed the sound of the ducks, and you turn and shoot the ducks at the right moment.
Paul and his team built a few of these games over time before deciding that wasn't what they wanted to do.
It didn't feel right.
We got to take a break, and then I'll tell you why.
Support for the show comes from Framer.
Framer is an enterprise-grade no-code website builder
used by teams at companies like Perplexity and Muro to move faster.
With real-time collaboration and a robust CMS,
with everything you need for great SEO,
not to mention advanced analytics that include integrated A-B testing,
your designers and marketers are empowered to build
and maximize your dot-com from day one.
So whether you want to launch a new site,
test a few landing pages,
or migrate your full.com,
Framer has programs for startups,
scale-ups, and large enterprises
to make going from idea
to live site as easy
and fast as possible.
Learn how you can get more
out of your dot com from a Framer specialist
or get started building for free today
at Framer.com
slash Verge for 30% off
a Framer Pro annual plan.
That's Framer.com
slash verge for 30% off.
Framer.com slash verge.
Rules and restrictions may apply.
Welcome back.
So at this point in our story, it's right around 2013.
After working on Papa Sangre, the team at something else made a game called Audio Defense,
which is just a straightforward shooter game.
This is the kind of thing Paul meant when he talked about Audio Pong and Audio Tetris.
It's just a normal game with normal mechanics, but with no pictures.
In Audio Defense, there are waves of bad.
bad guys coming at you, you shoot them. That's the whole bit. And it's all done just in audio.
There are zombies. You can hear them. But you can't see them. Turn carefully until the zombie
sound is in front of you. It was a very interesting game in that regard. It was a much more
sort of casual games to pick up, put down. And it was effective and it worked. I think what we
realized after releasing that game was the natural benefits of audio games.
meant that more artistic, more carefully considered, more complex experiences, certainly for our
team, with our skills, would probably find a better market fit than that game. You would think
that that game, if people like the other Pappasangre games, the audio defense would just find
a larger audience. After all, it's the same kind of mechanic, but it's much, much more simple,
and it's much easier to understand. Funnily enough, that didn't happen. The game didn't do as well
is the other games.
The Something Else team really thought audio defense was going to be huge.
But in retrospect, Paul seems to think that it kind of makes sense that it wasn't.
Because the best audio games aren't just audio versions of video games.
There are some cool accessibility upsides to that, for sure, letting more people get access
to these kinds of games.
But he's more intrigued by the idea of what you can only do in an audio game.
It's related to the fact that the earliest way that humans told stories to each other
would of course be verbally orally around the campfire.
And there is a mode and a pacing to that kind of storytelling
and a way that words can be used to conjure up images and ideas in the listener
that all of a sudden if you're just purely concentrating on in terms of an audio game,
you can also use.
And a story told orally is very different to a story told graphically.
There are reasons of pacing.
there are reasons of how different ideas are introduced and worked on.
But mainly, again, it's because there's a complicity between the listener and the storyteller
that has a different kind of suspension of disbelief or a different kind of enactment of belief,
a different kind of purposeful volition on the part of the audience.
And it means that you have a, I'm not going to say it's better or worse.
I personally think it's more interesting.
You have a broader palette of things that you can do.
Again, I've sort of said it's more like a novel, kind of the subtlety and the complexity of the ideas that you can introduce and develop are just different from what you can tend to do in video games.
There's this longstanding debate in gaming about the role of storytelling.
There are those who think it's super important, that cutscenes are crucial, that you can do as much or more good narrative work in a game as you can in a movie.
And then there are those who just smash the A button to get through cutscenes.
I should confess, I'm mostly one of those people.
But in an audio game, there seems to just be more space, more freedom to just let the story be the whole thing.
Really, you have no choice but to be more thoughtful about it, or the whole thing will just devolve into chaos.
So if I'm playing a video game, think about how many different pieces of information I can have on that screen at the same time.
If you think about something like any modern, highly polished AAA game, you've typically your n numbers of different things about the status of you,
as a, let's say it's a first-person shooter or a first-person game. Anyway, you've got
n number of things that demonstrate the status of your avatar of you in the game, plus you've got
the relative position of n number of different objects, plus you've got the different choice
of weapon or object in your hand. There's just hundreds of different pieces of information,
you know, relative position of your other players and so on. And human beings aren't that great
when it comes to discriminating, usefully discriminating lots of different sound sources at the same
time. We tend to try and focus on a small amount, one or two sound sources at the same time.
And there's an upper limit to how many you can really have simultaneously before the human
being just going to go, well, you know, it's fine if I'm at a dinner party or if I'm in a
nightclub or if I'm on the street because I have a bunch of other things in terms of
situational context, knowledge that help me focus on the different things that are actually
important at that moment. But if I'm creating an entirely artificial experience, then I have
to be much, much, much more directed and much, much more careful in terms of what I'm presenting
to the player for them to be able to focus on and get information, which is going to help
them enjoy the system, because it needs to be fun. And trying to, you know, playing a game
of what should I be listening to now is not fun. Normally, when you get to a big moment in a
video game, there is so much going on at all times. But here is a climactic moment in Papa Sangre.
Just listen to how it goes.
hogs, chicken meat, not your flesh.
This all sounds like kind of a niche thing now, but Papa Sangre was a big hit, and Papa Sangre,
too, even more so. When that second game, the sequel, came out in 2013, it was one of the best
reviewed games of the year, with a 92% score on Metacritic.
For most of the year, it was literally the best reviewed iPhone game of the year.
And folks who classified themselves as blind lost their shit. Someone had gone out and had
tried to assemble a team of talented people to make an experience that had absolutely no compromise
in terms of its playability and its enjoyment.
I just went back and looked, by the way, and it was actually tied for number one in mobile
games that year.
The company behind it, Paul's company, something else, made a couple of other games and
ended up selling to Sony.
And then Sony, for contractual reasons Paul didn't really want to get into but did seem
sort of sad about, just shut the whole project down.
That was almost a decade ago now, and Paul has gone on to lots of other things.
But the more he and I talked, the more it became pretty obvious that he's still thinking about audio games.
And he thinks that actually 2024, this incredibly visual world we live in, might be the perfect time for it.
I'll tell you why right after the break.
Support for the show comes from LinkedIn.
If you're a small business owner, you know that every hire counts.
But time and resources are limited.
finding, connecting with, and screening the right candidates takes up valuable time you could be giving to your customers.
That's where LinkedIn Hiring Pro comes in.
It's built to be your hiring partner, helping you find the right candidates faster.
That way you can hire with confidence without turning it into another full-time job.
Hiring Pro streamlines the entire process from drafting your job to shortlisting candidates
and conducting AI-powered interviews for initial screenings.
It's updated conversational interface lets you describe what you need in plain language.
Nearly 60% of hirers find a candidate to interview within a week.
With Hiring Pro, you spend less time searching and more time connecting with the right talent.
And instead of getting buried in resumes, you get a focus shortlist that actually moves your hiring forward.
Join the 2.7 million small businesses using LinkedIn to hire.
Get started by posting your job for free at LinkedIn.
dot com slash track.
Terms and conditions apply.
Welcome back.
So let's go back to 2010
when the iPhone 4 came out
and it was this perfect confluence of technology
that made a really cool
spatial audio game possible
for the first time.
Remember that moment?
Well, Paul Benin thinks we're
at another one of those moments right now.
You know, I'm genuinely,
I've never been more excited
about the possibilities
for spatial audio in games.
I really haven't.
There was this alchemy that happened
at the beginning of the Pappasangray titles
where a creative idea
and technology and this kind of the ecosystem around it sort of came together to make those
games happen. And there's another point like that right now. Spatial computing, for example,
in quotes, VR. Meta has sold more Quest 3s than Xboxes in the last couple of years.
The hardware is out there for experiences like this. And by hardware, I also mean things like
small headphones that have really good noise canceling. You're the first person I've ever
talked to who thinks about the Vision Pro as an audio device. And I love that. But I'm curious,
of how you think about what that could be.
So I think it's interesting, if you look at the difference, like the philosophical difference
between something like Vision Pro and Meta's excellent hardware, which has got lots and
lots of benefits different ways.
But the overall difference where Apple is really concentrated has been on being able to bring
virtual objects into the space that you are in.
It literally is spatial computing is their shtick.
And it's very good at that.
And those objects don't need to be visual.
They can be audible and still be in line with Apple's design philosophy.
And if you think about what the early Pappasangro games were about, that's what they were about.
They were about objects that you couldn't see in the physical space that you were in.
You know, that's how those games worked.
It's literally how those games worked.
And that fits very closely with Apple's design principles for Vision Pro.
It doesn't exclude it from things like Quest as well.
And I'm sure that if and when the,
Quest Pro 2 comes out or the new hardware that we know this licensing Horizon OS, I'm sure that
some of that's going to be able to compete in terms of low latency, high definition, pass through
a video. So we'll see how long that advantage is maintained. But these devices are great for the
kind of thing that we're talking about, I would say. I think they're phenomenal. They're a little
heavier, right? Sure. And you don't need to have that thing strapped to your head. The AirPods Pro
that I'm wearing right now, an ideal solution for this kind of thing. Yeah, what are the kind of
specs that matter there. I mean, you know, again, as we talk about screens, we're in this
incredible GPU race and everybody's trying to make the higher density screens with higher
refresh rates. I'm sure there are other kind of measurable specs that are getting better over
time. But what are those things that you would look at as sort of raw technical advancements?
I think that we've reached the stage where that's no longer an issue. Okay. In terms of speakers in
your ears, that's no longer an issue. Bluetooth, I mentioned earlier on that you need to have
An output is proportional to input, and latency is very important.
But Apple is targeted, I think it's 12 milliseconds, or it might be 8 milliseconds or something
like that in terms of response to input.
And that is fine.
It's fine.
It's more than just the audio stuff, too.
Think about what you can do with great voice recognition and generative AI that can create
a personalized game script on the fly.
Or the fact that your phone's location services work indoors now.
so you can actually move through a game by moving through your living room.
And think about what's even possible in multiplayer gaming,
when instead of rendering someone's weird avatar with no legs or a crazy looking face,
you can just bring their voice into the space with you.
To stumble across your friend in a space and find out that they're there
is the kind of surprise and the kind of magic that would be phenomenal in one of these games
and the ability to shout to your friend that you need some help.
and not just like if you're playing Call of Duty,
but in a way that's weirdly more practical
because it's the only way that you can communicate.
Like if I'm playing Caller Duty, of course I can leave some,
I can leave some loot for anyone, for a random person.
But if I'm running out of ammunition or something,
or you have a resource which I need to unlock this door
in the next five seconds,
and I have the resource,
the tension of me throwing that object to you
or getting over to you to give you that object
is a set of really interesting things
that are fun to explore in a playful experience.
They're absolutely on the roadmap.
One game I think a lot about in this context is zombies run, which is sort of a hybrid of an exercise app and a video game.
The idea is that the game tells a story as you run, and then when the zombies get too close, you have to pick up the pace and run faster.
And the game actually knows and responds to how fast you're moving.
Yeah, go.
Raise the gates, please.
You know what? Just raise the gates.
All righty there.
I love zombies run, by the way.
Cannot recommend it highly enough.
And now I'm just imagining like an audio.
AR game that follows you around as you go about your day and then every once in a while just
adds zombies and makes you hide or run away. I don't know, maybe your whole office could play
together and it would interrupt all your meetings. This sounds awesome to me and it only works in
audio. There really haven't been a lot of great audio games in recent years. People in the community
still talk a lot about Papa Sangre actually and how sad they are that it's gone. There are a few
games out there, though. As I mentioned earlier, Blind Drive is pretty fun, and I also played one
called Fear, F-E-E-R, that is another of those endless runner-style games like Subway Surfer
or Temple Run, where you dodge enemies and obstacles and just try to make it as long as you can.
But the trick here is, of course, it's all audio. These games are really different from what I'm
used to, but they're really fun. And at least if you believe Paul, the audio game is due for a
comeback. It's a thing which we know about now. It's a thing that we
we are culturally ready for in a way that we weren't. And you've also identified the fact that
visual culture, visual entertainment on the same device that we would be putting this on or similar
devices is also much, much more addicted, much, much harder to compete with. So all those things
are true. I think that the scale, the sheer volume of devices and hardware that's in the market,
but we can't compete with TikTok, but we can compete with podcasts. We can compete with audio. And we can
also be a part of the gaming economy, ecology, landscape. So all of that makes me think that it
absolutely is worthwhile doing. Exciting to explore. All right, that's it for us today. Thanks to Paul
for being here and thank you as always for listening. This is the second in our four-part series about
the five senses of gaming. So make sure you heard the touch episode from last week. In the meantime,
we'll be back on Tuesday and Friday with our regularly scheduled programming. This show is produced
by Andrew Marino, Liam James, and Willpour. The Vergecast is
is a verge production and part of the Vox Media Podcast Network.
We'll be back on Tuesday to talk about iPads,
rate to repair, and a question for the Vergecast hotline.
We'll see you then.
Rock and roll.
