The Vergecast - The rise of the audio-only video game

Episode Date: May 12, 2024

In episode two of our Five Senses of Gaming miniseries, David Pierce dives into the world of hearing with audio-only video games with Paul Bennun, who has been in this space longer than most. Years ag...o, Bennun and his team at Somethin’ Else made a series of games called Papa Sangre that were among the most innovative and most popular games of their kind. He explains what makes an audio game work, why the iPhone 4 was such a crucial technological achievement for these games, and more. Email us at vergecast@theverge.com or call us at 866-VERGE11, we love hearing from you. Learn more about your ad choices. Visit podcastchoices.com/adchoices

Transcript
Discussion (0)
Starting point is 00:00:00 Support for the show comes from Retool. Too many companies run critical operations on duct taped spreadsheets, Slack workflows, and whatever else they could cobble together. Not because they want to, but because building internal tools means weeks of waiting on someone else's backlog. That's where Retool comes in. Build custom internal tools just by describing what you need. Prompts something like,
Starting point is 00:00:22 Build Me a Revenue Dashboard on our Salesforce data. And Retool actually builds it on your company's data, in your cloud with enterprise security built in. Go to retool.com slash vergecast. We all need to retool how we build software. Welcome to the Vergecast, the flagship podcast of Floating Point Math. I'm your friend David Pierce, and this is the second episode in our series, sponsored by Visible Wireless, about the five senses of video games.
Starting point is 00:00:54 If you missed last week's episode, which was the Touch episode, all about speed runners and the N64 joystick crisis of 2024, go back and listen. It's a super fun episode. This week's sense is hearing, sound. And I want to start by telling you about a game I've been playing a lot over the last few weeks. It's called Blind Drive. Start. What the hell?
Starting point is 00:01:17 Wait, hey! Here's the story of Blind Drive, which is available for lots of platforms, but I've been playing it on my phone. You're the main character. You thought you were taking part in a scientific study, but suddenly you find yourself driving a car against traffic with a blindfold on. You can't see anything. The only way to survive is by hearing the cars, trucks, and cops and bikers and whatever else coming at you and then tapping on the phone screen to steer away from them. The longer you make it, the higher your score, and the happier you make this scary sounding guy on the phone. I've never played a game quite like this before.
Starting point is 00:01:58 It's really intense and really fun, and I find myself way more focused on it because it's all based on sound. I'm on edge all the time playing it. It's kind of a lot, honestly. It's been fun to play this too because it feels so different than what I'm used to. We live in such visual times right now. We scroll vertical video. We look at our TVs. We put on VR headsets.
Starting point is 00:02:20 We play these huge, gorgeous open world games. There's just always so much to look at all the time. But there's something really powerful and different about audio. I mean, you get it. You listen to podcasts. Podcasts are proof of how cool and powerful audio can be. But I'd never really even thought about audio as more. more than just a feature of video games.
Starting point is 00:02:41 It's a way to make explosions sound more real, or at the very most, alert you to when someone is sneaking up behind you. But there's actually a rich history of audio games, video games without the pictures, and some reasons to think those games might be coming back. One of the people responsible for a big part of that history is this guy. That's right. As a baby, on a BBC Model B, because I'm that old,
Starting point is 00:03:04 and it was my first game. is, you know, in quotes, my first video game. Paul's route into audio games was a windy one that actually makes perfect sense in retrospect. He was really into games as a kid. He was writing code on a BBC Model B in the 80s by actually copying it out of a magazine. That was how you wrote code back then. Then Paul got into the music business and the broadcast radio business. And then he actually ended up directing a video game version of the game show You Don't Know Jack.
Starting point is 00:03:33 It's time for the show where high culture and pop culture All right, we got ourselves. One, two, three players on board the flight today. Making that game, he told me, was mostly about making the audio. All of the effort of that game sort of went into phenomenal writing, good game design, and smoking mirrors that would make it seem like you were talking, that you, the player, were inside a live TV game show. And doing that kind of reawakened my love for technology and creative technology. And I've really been doing that ever since. At this point, Paul had built lots of different kinds of games, but he had always thought it would be cool to make one that was all audio, for a couple of reasons, actually.
Starting point is 00:04:18 There was a broadcaster in the UK called Channel 4. They were originally sort of looking for games that were accessible. The games that could be played by communities who were generally excluded from playing games. And so I didn't want to make a game for people with a quote-unquote disability. I wanted to make games that everyone could play and that everyone would love without any compromise. Eventually, he decides to do just that. Build an audio-only game. At this point, it's about 2010, which is important because there was a big technological development that happened in 2010. The iPhone 4 came out.
Starting point is 00:04:54 It had a bunch of new features like the retina display and the selfie camera, and it eventually was the one that went through all that antenna gate stuff. Remember that? The whole Steve Jobs, you're holding it wrong thing. But for Paul, there was a significantly less publicized new feature that really, really, really mattered. iPhone 4 was the first Apple's devices that had hardware acceleration for floating point math. Now, why is that important? It's important because we were doing real-time binaural synthesis on a mobile device. So let's talk about binaural audio.
Starting point is 00:05:26 I have two ears. Most people have two ears, but not everyone has two ears. And you might think that, therefore, that we listen in stereo. Well, we don't. The shape of your, the fleshy bit of your ear, the pinner, has got these folds in it, these lumps and folds. You probably know the basics of this. Sound hits your ears at different times. Your brain does a huge amount of work to both make all of that sound make sense and also to intuit where that sound is coming from based on when it hits all those lumps and folds. What it turns out that you can do is that you can get an artificial head with two good microphones where the ears would be and you can blade. some white noise at that head from lots and lots of different directions. And you can work out how the human head hears those sounds from different sources. And you can express that in terms
Starting point is 00:06:16 of a big table of numbers. Now what I can do is I can take away the white noise and I can apply those numbers to an arbitrary bit of audio, an arbitrary audio file. And I can get the right number from where I want this sound to sound like where it's coming from. I'm way oversimplifying here, but basically once you can map sound the way Paul is describing, you can manipulate it. Delay sound coming from a spot on the right just a tiny bit, and it'll sound like the thing over there is further away. Increase sound in the front and decrease it behind, and boom, you have something right smack in front of your face. This is a simple enough concept, but it's really, really tricky math. And you're going to do that, by the way, 44,000 times a second, or 48,000 times a second,
Starting point is 00:06:59 because that's a typical sampling rate for digital audio. So as you can see, that requires a lot of math. Pre-Iphone 4, there wasn't a device that could do it. And Apple helped us out a lot, by the way, with the Accelerate Framework, which came in at that point, which enabled us to do this. So it's actually pretty simple to, it's the same as in any video game, right? Imagine there's, you've got an X and Y coordinates. You've got a grid in front of you.
Starting point is 00:07:23 There's X and Y. And as objects move relative to the player, you can change the X axis and you can change the y-axis and you can calculate where the object is. So it's just the same in that regard between a video game and an audio game. In a video game, you're just saying, well, you know, that the monster has walked diagonally away from you so now I can change the x-axis and the y-axis position and I can just render that object and, you know, I'm watching it on my screen, everyone's happy and you can do exactly the same thing with audio games. It's just that instead of rendering it using whatever technique you're using to render an object in video, you're just applying a bit of the
Starting point is 00:07:59 of math to the frequencies of the audio, and now all of a sudden the human being hears it. It really hears it. The pictures are better on radio, as they say. What he's describing, I should say, is super normal now. It's just spatial audio, the idea that you can anchor something in place and then as you like turn your head or the object itself moves, the sound moves with it. Lots of devices have this now. And you can use it to all kinds of good and dumb effects. But at the time, it was revolutionary, especially when you also had a mobile phone with you that could accurately track your movement in space. We had to have an input scheme, an input control scheme in terms of changing the relative
Starting point is 00:08:37 position of the player and the audio that was incredibly precise. You could do it with a mouse, but there were some drawbacks to that. It's far, far better if the human being themselves is doing the turning, like the head is turning itself or the body is turning. And with Vision Pro and with the current generation of spatial AirPods and so on, well, we're used to that now. We understand that now. Back in the day, that didn't exist. So we kind of had to kind of create the hardware effect that you get from spatialized headphones without the existence of specialized headphones. And the only way that we could really see to do that was with the accelerometer and the magnetometer inside iPhones. So with all this suddenly possible, Benin and his team
Starting point is 00:09:22 at a company called Something Else set out to build an audio-only game. They ended up calling it Papa Saint-Gray. or very soon you'll forget everything and everyone you've ever known. The plot of Papa Sangre is pretty simple. I'd feel bad spoiling it for you, but the game is 13 years old and you literally can't download it anymore. So here goes. You're dead. You're stuck in Papa Sangre's kingdom, and the only way out is by completing a journey through five different palaces.
Starting point is 00:09:55 All the bad guys and monsters are everywhere, and they're trying to stop you. All those bad guys and monsters respond. to sound. So you have to move as quietly as you can while also looking for clues and talking to characters and generally trying to figure out what's going on and where to go. Because you can't see anything. There's nothing to see. The ice is thin. You need to be careful. Making a game like this work is on one level not that complicated. You can sort of do anything you want in audio. You can make audio pong just by using spatial audio to help you understand where the paddle and ball are. You can make audio Tetris, Paul also says, which seems wild.
Starting point is 00:10:37 You can kind of make audio anything. In the second version of Papa Sangre, there was even a duck hunt-style shooting game where you just followed the sound of the ducks, and you turn and shoot the ducks at the right moment. Paul and his team built a few of these games over time before deciding that wasn't what they wanted to do. It didn't feel right. We got to take a break, and then I'll tell you why. Support for the show comes from Framer. Framer is an enterprise-grade no-code website builder
Starting point is 00:11:16 used by teams at companies like Perplexity and Muro to move faster. With real-time collaboration and a robust CMS, with everything you need for great SEO, not to mention advanced analytics that include integrated A-B testing, your designers and marketers are empowered to build and maximize your dot-com from day one. So whether you want to launch a new site, test a few landing pages,
Starting point is 00:11:40 or migrate your full.com, Framer has programs for startups, scale-ups, and large enterprises to make going from idea to live site as easy and fast as possible. Learn how you can get more out of your dot com from a Framer specialist
Starting point is 00:11:56 or get started building for free today at Framer.com slash Verge for 30% off a Framer Pro annual plan. That's Framer.com slash verge for 30% off. Framer.com slash verge. Rules and restrictions may apply.
Starting point is 00:12:19 Welcome back. So at this point in our story, it's right around 2013. After working on Papa Sangre, the team at something else made a game called Audio Defense, which is just a straightforward shooter game. This is the kind of thing Paul meant when he talked about Audio Pong and Audio Tetris. It's just a normal game with normal mechanics, but with no pictures. In Audio Defense, there are waves of bad. bad guys coming at you, you shoot them. That's the whole bit. And it's all done just in audio.
Starting point is 00:12:49 There are zombies. You can hear them. But you can't see them. Turn carefully until the zombie sound is in front of you. It was a very interesting game in that regard. It was a much more sort of casual games to pick up, put down. And it was effective and it worked. I think what we realized after releasing that game was the natural benefits of audio games. meant that more artistic, more carefully considered, more complex experiences, certainly for our team, with our skills, would probably find a better market fit than that game. You would think that that game, if people like the other Pappasangre games, the audio defense would just find a larger audience. After all, it's the same kind of mechanic, but it's much, much more simple,
Starting point is 00:13:36 and it's much easier to understand. Funnily enough, that didn't happen. The game didn't do as well is the other games. The Something Else team really thought audio defense was going to be huge. But in retrospect, Paul seems to think that it kind of makes sense that it wasn't. Because the best audio games aren't just audio versions of video games. There are some cool accessibility upsides to that, for sure, letting more people get access to these kinds of games. But he's more intrigued by the idea of what you can only do in an audio game.
Starting point is 00:14:06 It's related to the fact that the earliest way that humans told stories to each other would of course be verbally orally around the campfire. And there is a mode and a pacing to that kind of storytelling and a way that words can be used to conjure up images and ideas in the listener that all of a sudden if you're just purely concentrating on in terms of an audio game, you can also use. And a story told orally is very different to a story told graphically. There are reasons of pacing.
Starting point is 00:14:39 there are reasons of how different ideas are introduced and worked on. But mainly, again, it's because there's a complicity between the listener and the storyteller that has a different kind of suspension of disbelief or a different kind of enactment of belief, a different kind of purposeful volition on the part of the audience. And it means that you have a, I'm not going to say it's better or worse. I personally think it's more interesting. You have a broader palette of things that you can do. Again, I've sort of said it's more like a novel, kind of the subtlety and the complexity of the ideas that you can introduce and develop are just different from what you can tend to do in video games.
Starting point is 00:15:20 There's this longstanding debate in gaming about the role of storytelling. There are those who think it's super important, that cutscenes are crucial, that you can do as much or more good narrative work in a game as you can in a movie. And then there are those who just smash the A button to get through cutscenes. I should confess, I'm mostly one of those people. But in an audio game, there seems to just be more space, more freedom to just let the story be the whole thing. Really, you have no choice but to be more thoughtful about it, or the whole thing will just devolve into chaos. So if I'm playing a video game, think about how many different pieces of information I can have on that screen at the same time. If you think about something like any modern, highly polished AAA game, you've typically your n numbers of different things about the status of you,
Starting point is 00:16:05 as a, let's say it's a first-person shooter or a first-person game. Anyway, you've got n number of things that demonstrate the status of your avatar of you in the game, plus you've got the relative position of n number of different objects, plus you've got the different choice of weapon or object in your hand. There's just hundreds of different pieces of information, you know, relative position of your other players and so on. And human beings aren't that great when it comes to discriminating, usefully discriminating lots of different sound sources at the same time. We tend to try and focus on a small amount, one or two sound sources at the same time. And there's an upper limit to how many you can really have simultaneously before the human
Starting point is 00:16:43 being just going to go, well, you know, it's fine if I'm at a dinner party or if I'm in a nightclub or if I'm on the street because I have a bunch of other things in terms of situational context, knowledge that help me focus on the different things that are actually important at that moment. But if I'm creating an entirely artificial experience, then I have to be much, much, much more directed and much, much more careful in terms of what I'm presenting to the player for them to be able to focus on and get information, which is going to help them enjoy the system, because it needs to be fun. And trying to, you know, playing a game of what should I be listening to now is not fun. Normally, when you get to a big moment in a
Starting point is 00:17:24 video game, there is so much going on at all times. But here is a climactic moment in Papa Sangre. Just listen to how it goes. hogs, chicken meat, not your flesh. This all sounds like kind of a niche thing now, but Papa Sangre was a big hit, and Papa Sangre, too, even more so. When that second game, the sequel, came out in 2013, it was one of the best reviewed games of the year, with a 92% score on Metacritic. For most of the year, it was literally the best reviewed iPhone game of the year. And folks who classified themselves as blind lost their shit. Someone had gone out and had
Starting point is 00:18:07 tried to assemble a team of talented people to make an experience that had absolutely no compromise in terms of its playability and its enjoyment. I just went back and looked, by the way, and it was actually tied for number one in mobile games that year. The company behind it, Paul's company, something else, made a couple of other games and ended up selling to Sony. And then Sony, for contractual reasons Paul didn't really want to get into but did seem sort of sad about, just shut the whole project down.
Starting point is 00:18:36 That was almost a decade ago now, and Paul has gone on to lots of other things. But the more he and I talked, the more it became pretty obvious that he's still thinking about audio games. And he thinks that actually 2024, this incredibly visual world we live in, might be the perfect time for it. I'll tell you why right after the break. Support for the show comes from LinkedIn. If you're a small business owner, you know that every hire counts. But time and resources are limited. finding, connecting with, and screening the right candidates takes up valuable time you could be giving to your customers.
Starting point is 00:19:23 That's where LinkedIn Hiring Pro comes in. It's built to be your hiring partner, helping you find the right candidates faster. That way you can hire with confidence without turning it into another full-time job. Hiring Pro streamlines the entire process from drafting your job to shortlisting candidates and conducting AI-powered interviews for initial screenings. It's updated conversational interface lets you describe what you need in plain language. Nearly 60% of hirers find a candidate to interview within a week. With Hiring Pro, you spend less time searching and more time connecting with the right talent.
Starting point is 00:20:00 And instead of getting buried in resumes, you get a focus shortlist that actually moves your hiring forward. Join the 2.7 million small businesses using LinkedIn to hire. Get started by posting your job for free at LinkedIn. dot com slash track. Terms and conditions apply. Welcome back. So let's go back to 2010 when the iPhone 4 came out
Starting point is 00:20:27 and it was this perfect confluence of technology that made a really cool spatial audio game possible for the first time. Remember that moment? Well, Paul Benin thinks we're at another one of those moments right now. You know, I'm genuinely,
Starting point is 00:20:39 I've never been more excited about the possibilities for spatial audio in games. I really haven't. There was this alchemy that happened at the beginning of the Pappasangray titles where a creative idea and technology and this kind of the ecosystem around it sort of came together to make those
Starting point is 00:20:52 games happen. And there's another point like that right now. Spatial computing, for example, in quotes, VR. Meta has sold more Quest 3s than Xboxes in the last couple of years. The hardware is out there for experiences like this. And by hardware, I also mean things like small headphones that have really good noise canceling. You're the first person I've ever talked to who thinks about the Vision Pro as an audio device. And I love that. But I'm curious, of how you think about what that could be. So I think it's interesting, if you look at the difference, like the philosophical difference between something like Vision Pro and Meta's excellent hardware, which has got lots and
Starting point is 00:21:29 lots of benefits different ways. But the overall difference where Apple is really concentrated has been on being able to bring virtual objects into the space that you are in. It literally is spatial computing is their shtick. And it's very good at that. And those objects don't need to be visual. They can be audible and still be in line with Apple's design philosophy. And if you think about what the early Pappasangro games were about, that's what they were about.
Starting point is 00:21:56 They were about objects that you couldn't see in the physical space that you were in. You know, that's how those games worked. It's literally how those games worked. And that fits very closely with Apple's design principles for Vision Pro. It doesn't exclude it from things like Quest as well. And I'm sure that if and when the, Quest Pro 2 comes out or the new hardware that we know this licensing Horizon OS, I'm sure that some of that's going to be able to compete in terms of low latency, high definition, pass through
Starting point is 00:22:26 a video. So we'll see how long that advantage is maintained. But these devices are great for the kind of thing that we're talking about, I would say. I think they're phenomenal. They're a little heavier, right? Sure. And you don't need to have that thing strapped to your head. The AirPods Pro that I'm wearing right now, an ideal solution for this kind of thing. Yeah, what are the kind of specs that matter there. I mean, you know, again, as we talk about screens, we're in this incredible GPU race and everybody's trying to make the higher density screens with higher refresh rates. I'm sure there are other kind of measurable specs that are getting better over time. But what are those things that you would look at as sort of raw technical advancements?
Starting point is 00:23:04 I think that we've reached the stage where that's no longer an issue. Okay. In terms of speakers in your ears, that's no longer an issue. Bluetooth, I mentioned earlier on that you need to have An output is proportional to input, and latency is very important. But Apple is targeted, I think it's 12 milliseconds, or it might be 8 milliseconds or something like that in terms of response to input. And that is fine. It's fine. It's more than just the audio stuff, too.
Starting point is 00:23:30 Think about what you can do with great voice recognition and generative AI that can create a personalized game script on the fly. Or the fact that your phone's location services work indoors now. so you can actually move through a game by moving through your living room. And think about what's even possible in multiplayer gaming, when instead of rendering someone's weird avatar with no legs or a crazy looking face, you can just bring their voice into the space with you. To stumble across your friend in a space and find out that they're there
Starting point is 00:24:00 is the kind of surprise and the kind of magic that would be phenomenal in one of these games and the ability to shout to your friend that you need some help. and not just like if you're playing Call of Duty, but in a way that's weirdly more practical because it's the only way that you can communicate. Like if I'm playing Caller Duty, of course I can leave some, I can leave some loot for anyone, for a random person. But if I'm running out of ammunition or something,
Starting point is 00:24:24 or you have a resource which I need to unlock this door in the next five seconds, and I have the resource, the tension of me throwing that object to you or getting over to you to give you that object is a set of really interesting things that are fun to explore in a playful experience. They're absolutely on the roadmap.
Starting point is 00:24:39 One game I think a lot about in this context is zombies run, which is sort of a hybrid of an exercise app and a video game. The idea is that the game tells a story as you run, and then when the zombies get too close, you have to pick up the pace and run faster. And the game actually knows and responds to how fast you're moving. Yeah, go. Raise the gates, please. You know what? Just raise the gates. All righty there. I love zombies run, by the way.
Starting point is 00:25:05 Cannot recommend it highly enough. And now I'm just imagining like an audio. AR game that follows you around as you go about your day and then every once in a while just adds zombies and makes you hide or run away. I don't know, maybe your whole office could play together and it would interrupt all your meetings. This sounds awesome to me and it only works in audio. There really haven't been a lot of great audio games in recent years. People in the community still talk a lot about Papa Sangre actually and how sad they are that it's gone. There are a few games out there, though. As I mentioned earlier, Blind Drive is pretty fun, and I also played one
Starting point is 00:25:39 called Fear, F-E-E-R, that is another of those endless runner-style games like Subway Surfer or Temple Run, where you dodge enemies and obstacles and just try to make it as long as you can. But the trick here is, of course, it's all audio. These games are really different from what I'm used to, but they're really fun. And at least if you believe Paul, the audio game is due for a comeback. It's a thing which we know about now. It's a thing that we we are culturally ready for in a way that we weren't. And you've also identified the fact that visual culture, visual entertainment on the same device that we would be putting this on or similar devices is also much, much more addicted, much, much harder to compete with. So all those things
Starting point is 00:26:22 are true. I think that the scale, the sheer volume of devices and hardware that's in the market, but we can't compete with TikTok, but we can compete with podcasts. We can compete with audio. And we can also be a part of the gaming economy, ecology, landscape. So all of that makes me think that it absolutely is worthwhile doing. Exciting to explore. All right, that's it for us today. Thanks to Paul for being here and thank you as always for listening. This is the second in our four-part series about the five senses of gaming. So make sure you heard the touch episode from last week. In the meantime, we'll be back on Tuesday and Friday with our regularly scheduled programming. This show is produced by Andrew Marino, Liam James, and Willpour. The Vergecast is
Starting point is 00:27:09 is a verge production and part of the Vox Media Podcast Network. We'll be back on Tuesday to talk about iPads, rate to repair, and a question for the Vergecast hotline. We'll see you then. Rock and roll.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.