Sean Carroll's Mindscape: Science, Society, Philosophy, Culture, Arts, and Ideas - 181 | Peter Dodds on Quantifying the Shape of Stories

Starting point is 00:00:00 Hi, Diva, it's Rachel. And Jordan, yeah, hi, quick question. Why are you not spending your Venmo balance? Yeah, we're concerned. You can, like, buy stuff with it. Oh, you love buying stuff. And earn cashback on eligible purchases. You love purchasing eligible things.

Starting point is 00:00:14 So the money your friends sent you yesterday, that's today's today's today's ramen or ridechair or eye patches. The skincare kind, not the pyrokind. Spend with Venmo. And you can earn cashback with Vimmo Stash. Vimmo Stash bundle terms and exclusions apply. Max $100 cashback per month. See terms at vinmo.combo. slash-d-terms. ID verification required to use a Vimobalance.

Starting point is 00:00:30 From the neon lights of the club to the harsh, buzzing lights of the office. Don't let the wear show on your face. Just swipe Mabeline instant eraser concealer to erase the night before, wherever that happens to be. Instantly cover dark circles and under-eye bags for a brighter, more awake look. This do-it-all formula also contours,

Starting point is 00:00:49 corrects, and highlights, all while staying lightweight, crease-resistant, and smooth. It may be the world's greatest eraser. Find your shade of instant. an eraser concealer at your local retailer. Hello, everyone. Welcome to the Mindscape podcast. I'm your host, Sean Carroll. For today, we're going to go into big data and storytelling. So I remember once hearing a conversation or maybe reading about it online where someone was complaining the usual complaint

Starting point is 00:01:15 about Hollywood movies, how there's sort of lowbrow and predictable, but frustratingly popular compared to more intellectual affair. And someone else gave the counterargument saying that's because Hollywood does what Aristotle told storytellers to do 2,500 years ago, to have a structure, a three-act structure, denouement, conflict, the whole bit.

Starting point is 00:01:39 And there's something to that. You know, on the one hand, there's plenty of different kinds of stories. They don't all follow the Hollywood three-act structure. On the other hand, it can, meanwhile, I should say, it can be a little bit predictable, but it nevertheless works.

Starting point is 00:01:53 You know, there's something about that kind of story that hooks us, that carries us along. And there are other stories that carry us along in different ways, and other people are fans of those. So given this feeling that stories, which are these quintessentially human artifacts, right, given this idea that they have structure somewhere, wouldn't it be great to do science to that idea, to try to tease out using data, using some kind of collection of information, whether or not real-world stories, the stories that we'd like to listen to, and the stories that we tell both spontaneously and from great planning have different kinds of structure of this form.

Starting point is 00:02:32 So that's what today's guest does. Peter Dodds is a statistician at the University of Vermont, who studies big data kinds of problems in many different contexts, from earth sciences to language to ecology. But he's one of the heads of what is called the computational story lab. And what they do is they consider individual words and they rank them. They rank these different words where they have people rank them in all sorts of different ways, right? Different valences for, are the words happy or sad? Are they strong or weak, et cetera? And then they ask, how important are these different rankings? How much correlation is there between different kinds of axes upon which different words have these values? And they try to seek out using math, what are the most important aspects that

Starting point is 00:03:20 words can have playing a role in a story? It turns out there's a two-dimensional framework, which is very nice. Words go from a spectrum of weak to powerful and also from safe to dangerous. Those are the two aspects. Those are the two axes that matter the most for the impact that words have in stories. And then you can plot real stories, whether it's novels or screenplays. For that matter, you can plot things like the emotional state of the world by looking at Twitter or looking at other social media. I'm not going to give away all of the answers here, but Peter does a good job of explaining how stories do have structure. It's not just our imagination. We're not just imprinting structure on it from inside ourselves. There's a real sense in which successful stories

Starting point is 00:04:07 have a certain kind of flow. And it's fascinating to look at why people respond to the stories in different ways, which you can look at on Twitter, right? What kinds of events are causing people to be happy or sad, to take refuge in words that are powerful or dangerous or weak or safe or whatever? So this is very early days, I think, for this kind of work. It's very difficult. Language, humanity, meaning, it's all there. But we're beginning to have these big data sets that let us ask these questions in really new ways. So it's going to be exciting to see what comes out of this kind of work.

Starting point is 00:04:42 So let's go. Peter Dodds, welcome to the Mindscape Podcast. Thanks. Great to be here. I think a good starting point for this, because as we just said seconds ago, before we started recording, there's a lot to cover. But I love your invocation of the famous Kurt Vonnegut lecture about the shapes of stories. And in some sense, you're taking that idea, the shapes of stories and quantifying it.

Starting point is 00:05:22 You know, like being a good scientist, using the big data techniques to nail down some numbers. Is that more or less inaccurate? It's a partial thing that you do, of course, but is that an accurate way of saying one of the things you're aiming to do? Yeah, I mean, I have this kind of layout for. basic science and there are sort of two pieces, fundamental pieces, which are describe and explain, just to sort of make it really simple. And I do that because I think it helps students understand which part they're acting on. But coming into that is taste and what do you choose to work on and what's meaningful, what, you know, and that's hard, right? But it really matters

Starting point is 00:06:00 tremendously. You know, and sometimes you're sort of a bit nervous for maybe years about when the things will matter. I mean, you know, it's the game, right? But stories to me have become more and more, sort of foremost in my mind of just this incredibly important aspect of being a human and how cultures work and so on. And I know many different fields and just people in general easily have come to that, right? Religions, politics, it's all there. I came to it from social contagion trying to think about how things spread. And that was all very sort of simplistic models. You know, do you sort of wear out boots or not wear a funny hat or not a funny hat?

Starting point is 00:06:44 Or perhaps, you know, take on a political belief or not. But it was all sort of physics's sort of model, simple things. And they tell important stories about systems. And sort of out of that eventually over many years, stuff to sort of think more and more about the deeper things that people might run around with, which are stories, you know. And they can range from very simple, like proverb-type stories. the U.S. has rags to riches, right?

Starting point is 00:07:07 The American Dream is a really fundamental kind of story. And trying to, you know, so how do you then start to measure those things? How do you, right? And I'm really, I do come from a physics background. Good. Well, I, good and bad. That's good, yeah. So, you know, stat-mec kind of stuff.

Starting point is 00:07:29 So, you know, just if you look back through physics, through thousands of years, we had some pretty crazy ideas about how things worked, right? And that's how science has to progress. But measurement just drove everything eventually, right? If you think about one of my examples that I often put out as temperature, you know, measuring temperature, which we take for granted now. But that, that took, well, in the last 500 years, hundreds of years to get to a point where people like, oh, actually, you can do that. We're pretty happy with measuring distance. You know, measuring time, really hard. Really hard to measure time well. Yeah. I mean, amazing, right?

Starting point is 00:08:05 It was sundials for a long, long time. And time is a big piece in some of the work we've done recently, too. Like, how do you experience it? But so, all right, that's a bit long. But I guess with the big data kind of revolution, and we call it big data because it's about people. I mean, we've had big data in many fields, is there's this kind of blue-collar kind of honest hard work

Starting point is 00:08:29 that we have to go back to, which is just let's really, really look at this stuff and measure it and quantify it. And maybe we had a pretty good time into the 80s and 90s of making simple models and telling all these beautiful stories about the world. But they were gloriously free of data, which, you know, if you have a beautiful idea, probably don't go look at reality, right? Because you might be sad.

Starting point is 00:08:57 Yeah. So, you know, and of course, string theory, right, is, you know, we've got some beautiful examples in physics still. Although that's beautifully done because you can't really ever sort it out. So that, I feel it's just almost just being responsible, right? We're just trying to measure things well, right?

Starting point is 00:09:16 We've got these hard problems. Let's see what we can do. And things have changed tremendously the last 10 to 20 years. All right, so Vonnegut. You know, I think I came across this YouTube video of Vonnegut talking about it. It's probably how I came across it first

Starting point is 00:09:28 and I showed it to my students. I'm like, look, we should be able to do this. This fits in with, work that we've been doing for many years before, which was measuring emotional states of populations. And some people, some people in the audience might not actually be familiar with the video. So maybe remind us what Vonnegut's actually saying there. Yeah.

Starting point is 00:09:46 So, so it's sort of a five-minute version fine. I think, because I think he told the story in many places. It's really quite charming. And so he just, he sort of lines up a graph and it's a sort of ill fortune, good fortune on the vertical axis, you know, good fortune to the top, and then time, right? So time is the big, so going to the right there. And then marks out a simple graph and it's, it just sort of starts high and then goes down and comes back up again, like a little wave, right? And then says, this is what he called a man in a whole story, right? So this is a, this is a, you know, many

Starting point is 00:10:24 sitcoms, many stories kind of just work like this. They start off, things go wrong, they get back to where they were. And, and, and, and he, and, and, and, and, he's, you. little sort of line there was, you know, people love that story. They love it, right? There's nothing about plot in here, and I want to be really clear about this. This is just the overall emotional arc. It gets a bit conflated with plots, and that's a much deeper, harder thing that we're trying to work on as well. So emotional arc. So you think, all right, well, maybe we can, maybe we can do this. And the work that we had sitting around that we built for a long time was this idea of what we call a hedonometer, right? So measuring happiness, but equally sadness, I should

Starting point is 00:11:04 point out. And that came out of older work from the maybe 60, 70 years now, I think, of trying to measure the fundamental dimensions of meaning. And this to me is really, really, I mean, this is, actually, you know, this is the most exciting thing I've ever worked on, the more recent stuff about that, and we'll get to it. Yeah, I mean, just thrillingly. incredible. But the idea is, okay, well, let's, let's, if I can kind of expand on this, let's give, yeah, give people, you know, so trees, cars, you know, your life, like, what, we have all these aspects of meaning associated with them, how you feel about something, and feeling and meaning, are allied in interesting ways. So how do you sort of boil that down, right?

Starting point is 00:11:54 So we, we have, you know, maybe you could look at a dictionary, a thesaurus, and you've got this rich space of meaning and the recent, more recent work that we have in deep learning and so on is like, here are 300 dimensions of meaning. And it's like, whoa, you know, what could go wrong with? So we're at the absolute other end of that, which is what's the absolutely most essential aspect of meaning. And what was sort of dug out over decades and through, you know, of course, initially small scale studies with people, obviously students in psychology, right, It's the usual game. But here was the idea, okay, we'll give you a bunch of objects or concepts of whatever,

Starting point is 00:12:33 and you have to just assess them on semantic differentials, and we'll give you a bunch of these. And so they are things like hard to soft, good to bad, big to small, like all these kind of very natural things, that we're fairly comfortable with them being antonyms, right? They represent opposite ends of some spectrum. And so this was done, as I said, in the 40s and 50s. And the first big work was actually for pings from submarines, which is quite charming. It's a really interesting work. And it's some handlers, you know, some people working with radar, how did they feel about the sounds?

Starting point is 00:13:12 Like did it kind of danger, energy? What did it mean to them? So that kind of spread out from there into thinking about meaning of anything. and what was sort of oiled down over many years was this idea of valence being dominant. And it's a nicely inscrutable word, but it does generally. And I think that's not unuseful, but it means good to bad, basically, right? So happy to sad, so collapsing a lot of things. And so you can imagine from a evolutionary point of view, like a sort of a survival point of

Starting point is 00:13:45 you, you know, you're an organism, you have a sense of what's good and helpful and positive and negative. and you're attracted to one end and you're repelled from the other. So it had this very sort of fundamental aspect to it. There are a couple of dimensions that came out. And the tricky thing is you've started with hard and soft, white and heavy. You've started with all these very sensible ones.

Starting point is 00:14:09 And you have to figure out then, because what's really going on, we're solving like an SVD type problem, a linear algebra problem. You have to explain what SVD means. Yeah. So singular value decomposition. What you're trying to figure out is if you've got all of these axes, like these semantic differentials,

Starting point is 00:14:29 if we sort of take the right point of view, it may be that there's some way of adding them up and subtracting some from the others to get a really fundamental kind of dimension. Like you might see that there's shape in front of you. So words have points in this space, right? You can imagine words or things. But let's talk about words. So I'm going to present you with a word, you know, football or chicken, and you have to rate it on all of these different semantic differentials.

Starting point is 00:14:57 So then it has some point in these words have a point in the space of semantic differentials. And then the idea is we'll rotate that space around and play with it a little bit. And maybe we see, oh, it's kind of, you know, really dominant in these ways. And say that this valence dimension is, you know, it's a sum of all of these things in some complicated way. but maybe, you know, the good, bad, semantic differential probably lines up with it. Love bread, bake goods and pasta, but not the way they make you feel? What if I told you there are macro-friendly options that don't taste like sawdust and sadness? Satisfying sandwiches, fully loaded bagels, noodles, noodles that can stand up to your favorite chunky sauces, all delicious.

Starting point is 00:15:35 Craveworthy and smart, each serving of Hero Bread has up to 19 grams of protein and 32 grams of fiber and just zero to five grams net carbs and zero grams sugar. Hero Bread bakes with heart-healthy olive oil and delivers the soft, fluffy, flavorful experience you love. Breakfast burritos, smear-loaded bagels, real mac and cheese. Hero bread bakes loaves, bagels, and tortillas that don't taste or feel like cardboard. Noodles that don't fall apart in hearty sauces. Plus, limited edition small batch bakes like the 2 grams net carb hero croissant or 1 gram net carb hero cheddar biscuit, handmade in a Sonoma-based French bakery.

Starting point is 00:16:09 Shop now on hero.co. Use code iHeart for 10% off. That's hero.co. Per serving, not a low-calorie foods and products contain allulose, see nutrition info on hero.com for sodium and sugar content. Hey, everyone, it's Cal Penn. I'm the host of Earsay, the Audible and I-Heart Audio Book Club. This week on the podcast, I am sitting down with Ray Porter, the narrator of Andy Weir's audiobook Project Hail Mary,

Starting point is 00:16:38 massive sci-fi adventure about survival and science, and what happens when you wake up alone very far? from Earth. I really had to make a decision because I caught myself getting that frog in my throat and starting to get teary as I'm narrating some of these sections and it's like, okay, yo, yeah, yo, is this indulgent? And I really thought about it. I was like, no, at this point, it would kind of be betraying the trust the author and the listener have in telling this story if I don't go through it. But there's places in this book that deeply emotionally affected me and I left it on the mic. That's great. Because it served the story.

Starting point is 00:17:14 People will say like, oh my God, I cried at the end. It's like, yeah, dude, me too. Listen to Earsay, the Audible and IHeart Audio Book Club on the IHeart Radio app or wherever you get your podcasts. So you're taking all these words and you have many different possible axes along which your students, I guess, or subjects are rating them. But some just correlate exactly with others and sort of that's kind of redundant information. And you're looking for what are the ways, what are the axes if you like? that matter the most? Is that a fair way of saying it?

Starting point is 00:17:47 That's right. That's right. And they're not, you know, it isn't any one of those Semitic differentials that you started with necessarily at some way. You have to,

Starting point is 00:17:55 you have to go through that and figure out, yeah. Okay, these ones are kind of, it's, you know, it's not like mass and length and those sorts of things, right? Now we're dealing with categorical.

Starting point is 00:18:05 It's, it's, so it takes a, you know, you have to sit down as a human, I think, and really kind of think through this. So, so that's right.

Starting point is 00:18:13 And, and the, work that we did initially to get this hedonometer stuff to work was to, I mean, essentially, actually, many years ago, I'm trying to figure out, okay, we've got all this data coming through, like blogs. It was a little bit before Twitter and Facebook really took off. But we looked at some other things like State of the Union speeches. There's hundreds of years there. Music lyrics for which we had, you know, 60, 70 years. So we're trying to get hold of different kinds of text, so text as data, that represented some aspect of human behavior.

Starting point is 00:18:44 None of these things are complete, of course. We wouldn't want to say that. But we thought, well, we've got the stream of, say, words coming through in real time. Can we figure out, like, is this population that's expressing it happy or sad? Or, you know, are they fearful or less fearful? And partly inspired by some of the things that were coming out of economists around the time. Greenspan, 2007 and 8 said, you know, he would throw out of it. all of these mathematical models if he could figure out why people are becoming more euphoric or fearful.

Starting point is 00:19:20 People could probably find that interview. It's on the John Stewart. It's on the Daily Show from a long time ago. It's quite a remarkable. It's before the housing crash too. Yeah. So, you know, I would carry that around as a good example. Like, you know, that seems a really basic thing to know. And of course, we want to put it up against something like GDP. You know, sure, the stock market's up, but are people happier or sadder? And it goes back to measurement. If you want to improve things, I think we're in this kind of really difficult time.

Starting point is 00:19:51 Well, we can measure some big, complicated things quite well, especially money, or at least we think we can. But we're leaving out these other mushy, harder pieces to measure. And as a result, of course, you try to maximize or optimize something that's not measuring everything. I mean, I think people understand that, but you sort of also forget it.

Starting point is 00:20:08 You look at the things that have charts. Like, look, Yeah, it's the stock market. You look under the lamp post, yeah. Yeah. So that was part of our challenge. I mean, I think it was a fundamental thing about people we were trying to measure as populations. And we're not really trying to, we're not trying to track individuals.

Starting point is 00:20:25 It's nothing that we would say, oh, you said this sentence, you're happy or sad. It has to be from many, many, many words. Right. So it's more like a physics-ish kind of that you're averaging over lots of pieces. So it kind of has an invilt privacy thing, if you like. we eventually created something online which is at hedonameter.org and it takes Twitter data. And that kind of sort of the banner thing is Twitter. And you can see over many years now, 13, 14 years, this sort of long arc of what Twitter is a complicated thing.

Starting point is 00:20:57 People have changed who's actually active on it. I think we have 10 languages, Russian, Korean, it's a whole thing. But it's exactly this kind of index, if you like. What's the, you know, the Dow Jones? index of happiness. And it has some big patterns. It's been going down, actually, for five or six years, but more recently has been kind of going up.

Starting point is 00:21:20 Sorry, the happiness has been going down? Yeah, since about 2015. Huh, weird. Yeah, going down. But the last year, it's been sort of slowly going up. 2020 was the first time we saw anything that I would call collective trauma. And, you know, of course, there's your own personal view of things, and that's what we're trying to take out of this,

Starting point is 00:21:43 like what we think about things. We're trying to get a sense of a population. And, you know, your listeners will have all of their own specific kind of feelings of how things have, you know, maybe 2014 was the worst year, you know, personally, right? But we're trying to get out the whole picture. And by collective trauma, what I mean is the advent of, you know, the world kind of understanding there was a pandemic. We sort of knew in January 2020 that there were dangerous things afoot.

Starting point is 00:22:11 But it wasn't really until, I think it's March 12th when the NBA suspended its season. Tom Hanks said he had COVID. All these things happened in about 10 minutes. And President Trump at the time gave a speech, sort of saying for the first time things weren't great. And the stock market, of course. The stock market started a tank straight away. So that was a big drop. And it also did, what we'd seen in the past is there were these big drops for deaths of celebrities, terrorist attacks, school shootings, you know, these things that occupies.

Starting point is 00:22:48 But then they've really quickly been wiped out by stories. You know, like people still talked about those things, but there's just this flood of stories all the time that, you know, of everything that's happening in the world. So there'd be drops, but they'd kind of come straight back. up, maybe a couple of days. But it took on the order of months, really, for Twitter to sort of rebound back up to its kind of normal level at the time, which is pretty low. And then George Floyd's murder was a huge drop, but it kept dropping as the protest built over the next few days because of people understanding what had happened and being out, you know, expressing their feelings online or what we measured as field. And that's the lowest drop we've ever seen. And again,

Starting point is 00:23:33 it took this a long time to come out of. January 6 was another big drop, actually. That's probably the third lowest over the whole time. So this was, you know, that's a whole, the many things have kind of come out of all of that where you can measure happiness of texts in lots of ways. And to finally get back to Vonnegut, what we did was we went to books and we said, all right, Let's see what he could this is this idea of Vonnegut's and he actually you know he says This is so simple even computers could do it You know this is maybe 1990 95 when he was saying these things and we thought we can probably do it and So so so so and in fact it turns out that I think in maybe

Starting point is 00:24:13 When was it 60s or 70s? He had he had I think it was the University of Chicago He wanted to do this as a master's thesis Right, right, he had presented it and and they said no and he was still mad at about that for decades and decades and decades. You can find him, you know, talking about how upsetting that was to him. So it's sort of an homage to him in some ways. But we, we got a bunch of books, maybe I think 20,000. You have to sort out what's fiction. It's a bit of a mess. But basically created this same hedonometer idea. But in this case, you're now sliding through the book. So you're going to say, okay, the first 1,000 words have this score. And then we'll

Starting point is 00:24:54 slide this little window. And we're not reading it like a person, right? We're just, it's like sometimes called a bag of words method. You're just going to put them all together and slide and get a score for them. Right. So not all words we have scores for, right? And some words we, you know, say the score is, is unimportant. Like the word, the word is a neutral word. Yeah, we ask people what they think of that word, right? So we, as I said, we had psychology, people do it with psychology students of course early on you know eventually it's online you do it with mechanical turk which is an amazon service where you you ask people what what they think about things um or you can use it to all sorts of things but uh so so the you know the scale of these

Starting point is 00:25:39 studies is now really quite large so you so we have you know scores for words yeah so you sort of separately score the individual words and now you're you're taking novels or what have you works of fiction and scoring, as you say, sections of those as you go through the text. And so you can see the happiness or the sadness go up and down as you read through the text. Right. And you play around with the window size

Starting point is 00:26:05 and you think about this. We did it for movie scripts as well. Scripts are useful. They have descriptions of what's going on. So they're actually somewhat rich. You can't get the final one, which I realized as we were doing this because I was looking at Alien and I was looking through the script and Ripley as a man in

Starting point is 00:26:25 what might be the fourth, the last script of your version of that. Anyway, so you've got some version of it and you do what you can. So, you know, if you look at something like Predator, starts okay and then just goes to like, it's terrible, you know, like it's just negative and drops. There's no, there's no sort of, you know, ups and downs and which we're more familiar with stories. It's like, so Harry Potter, the last, the Deathly Hello's, the last book, you know, really huge ups and downs as it goes through, right? So, you know, I think that's, we sort of think, we're trying to figure out what is there sort of characteristic scales of fiction. So, but what came out of that and we attacked it in various ways, but there are sort of six fundamental shapes, if you like. And there was, rags to riches ones,

Starting point is 00:27:16 so very simple, basically sort of goes up throughout the book. You know, may have some ups and downs, but that's sort of a, you know, this is like kind of like decomposing something into, decomposing a sound, you know, into its furrier waves or whatever you like. It's a bit like that. And I want to add something that's much more complicated, though.

Starting point is 00:27:34 So, but this is, of course, we're looking at emotional arc. So we do have signals. There's the tragedy where things just keep going down. So metamorphosis, maybe Kafka, right? It starts off badly. You're a cockroach and stuff with them, and it keeps going down. And then there's the man in the whole type one of Vonnegut. There's the inverse of that, which we called Icarus, right?

Starting point is 00:27:53 So it starts, things go really well, and then they go really bad. And then we had two others which were Cinderella and Oedipus, right? So Cinderella starts low, goes high. You know, you've gone to the ball with this fairy godmother's turn up, and then things go badly again. And then, you know, so there's a huge rise. And that's one of Vonnegut's fame. you know, favorite little stories that he talks about Cinderella fitting this pattern.

Starting point is 00:28:17 So it's a simple, you know, down up, down up. And the flip of that we had, we called that Oedipus, right? Starts well, things go bad. Then you kill your father and marry your mother. You know, like it ends, it ends, it ends, it ends, it ends, it ends, it ends poorly. So, I mean, yeah, just to, because sadly we don't have the visuals here for the audience, but this is, as I was, I saw your plots, though, the visuals are great. and plots in the sense of graphs,

Starting point is 00:28:43 not plots in the sense of story structures. But it is a, it's, I mean, what fraction of stories fit into these? Because it's a very simple kind of ex post facto natural thing. There's sort of the stories that have no maxima or minima in the revolution, right? It's either rikes or riches or tragedy.

Starting point is 00:29:02 And then there's stories with one maximum or minimum, and there's stories with two maxima or minima in that basic arc. Is that, like, are those six possibilities, what fraction of stories covered by that? I mean, it's some, you know, it's, it's again one of these things where it's like 90, 95%. It's amazing, yeah. But of this particular pool of books, right? So, you know, in this set of works.

Starting point is 00:29:27 So I think the future of this, of course, is to curate things really well. Like here are, here are detective stories. Here are stories from this particular culture and so on. So it becomes a, and we found this with. the hedonomy to work in general, if you estimate the happiness of a set of words, you might say, oh, okay, maybe I can get an error measure for that, right? This is a very typical thing to do with measurement. But it turns out it was completely in the lens. It's completely in the words, the list of words for which you have scores. So if you change that list of words by scoring more

Starting point is 00:29:58 or taking some out, you know, that's where the error is. It's all in the instrument. So, you know, in this case, yeah, we're, we have a, it's one of these things where we seem to have a big data set. We have 20,000 books. That's a hard thing to read, right? So this is beyond it. Right. This is important. It gets beyond it. No one's going to read 50 million tweets a day. And so what we're trying to do is what I sort of call telonomics, which is like distant sensing of knowledge. Right. So far, it's a genomics, like far knowledge. And because, yeah, there's no way an individual can do that. And we want to get some sense of what, you know, the whole thing, sort of streaming.

Starting point is 00:30:39 these tweets stream past you in three seconds, how would you feel? Pretty bad probably, just in general. But, you know, taking that part out, you know, is it better or worse than yesterday. I want to say that the man in the whole one, which is this favorite one of Vonnegut. So I would say that the framing of that is not great, actually, because, I mean, you know, he's sitting there, he has a drawing so you can, not like we're struggling here with the podcast, but he has a drawing so you can kind of see it in front of you and it all makes sense. But men at a hole doesn't tell you a sense of time. It doesn't give you an arrow, right? So metamorphosis could be men in a deepening hole, as that turns out.

Starting point is 00:31:22 But a person in a hole, it doesn't tell you that they start okay, they get into the hole and they get out. And I guess I think a lot about ads and slogans and so on. And it struck me before the 2016 election that make America great again. was the man in a whole arc. And it was in four words. It tells you about, you know, it indicates something about the past, the present, and the future, which is, you know, really powerful.

Starting point is 00:31:47 And it's, as I understand it's 1980, I think, it's Reagan and Bush. It was used in ads, like, Let's Make America Great Again. It was used in posters and so on. It wasn't quite the dominant slogan. But it's one of those ones that's really powerful. Bill Clinton used it. Lots of people have used it over the years in various ways,

Starting point is 00:32:06 because it is very powerful. I mean, and I think that, you know, as a rhetorical, as a story in four words, super powerful. And do you find that there are, you alluded to this a little bit, but relationships between these different kinds of story arcs or valence arcs, whatever you want to call them, and genre or literariness of the fiction? I mean, are there certain kinds of,

Starting point is 00:32:30 do you get highbrow fiction using one kind of pattern and pot boilers using another one? You know, we are working on that more now. We have some work where we're looking at things like accounting textbooks and, you know, manuals for televisions. And, you know, just like what happens? Because you want to know, like, are we getting something artificial? It's certainly if you randomly shuffle text, it's, you know, it doesn't produce these shapes, right?

Starting point is 00:32:55 There's, I mean, as you might hope, right? So there's sort of a, we can at least get that sorted out. But again, that's a curation of data and that, that, that, that, that, that, that, that, that, But I think we're still behind on. We're trying to build, well, we do have this thing called story wrangler. It's at storyrangling.org. And it's for Twitter at the moment. But the idea is to kind of house all of these different bodies of work

Starting point is 00:33:21 and have time series for their usage of words within them. So that hopefully eventually will be something that could kind of go towards what you're saying. We do, of course, have Google Books, which has been around for about 10 years. years now. The problem with that, I think, is that it doesn't have enough metadata. You can't really sort of broadly fiction, broadly everything. And as it turns out, we did some work on it, and we figured out that actually the kind of collective English stuff is full of science. There's a lot of medical and science type writing. And the 20th century is basically dominated by the sort of rise of science. And you can see it in little details like figure with

Starting point is 00:34:04 the capital F, it just goes up. And like, you know, Et al and all the things to do with data really actually about the exponential growth of science, which is a sort of understood, I suppose, in the 60s to Solar Price, presumably armed with a million graduate students, went through libraries, figuring out, you know, what the memory was in journals and how much stuff was being published. And anyway, that's imprinted in there in a way that we can't. Well, it makes sense. I think I noticed on your webpage that.

Starting point is 00:34:34 the most commonly used word on Twitter is RT, the abbreviation for retweet. That doesn't really mean it's the most commonly used in English, but on that particular medium, that's what pops out. Right. And you have to say, you know, what are you looking at? Twitter is interesting because it does kind of encode so much. And the news is for sure there. I mean, another way to look at all of this is to think about, you know, forests, right?

Starting point is 00:35:00 So we have a forest and you would like to know all the species in the forest. which is actually, of course, very hard to measure, and have the counts for them, right? How many are there of all these different species? So this is this, it comes out of linguistics, but the types and tokens distinction. Like, how, what are the, all of the, you know, what's your lexicon, which would be for language, here's your list of words, and then here's your list of all the animals and organisms. Yeah. And then you have next to it that counts, right?

Starting point is 00:35:29 But then you want to do that over time. So maybe for forests, it's at the scale of a year. There are studies that do this for small parts of forests. But we're sort of looking at forests of words and stories and trying to see how they change over time. Of course, they can change dramatically. Hey, everyone. It's Cal Penn.

Starting point is 00:35:49 I'm the host of Earsay, the Audible and I Heart audiobook club. This week on the podcast, I am sitting down with Ray Porter, the narrator of Andy Weir's audiobook Project Hail Mary, massive sci-fi adventure about. survival and science and what happens when you wake up alone very far from earth. I really had to make a decision because I caught myself getting that frog in my throat and starting to get teary as I'm narrating some of these sections and it's like, okay, yo, yeah, yo, yeah, yo, is this indulgent? And I really thought about it. I was like, no, at this point, it would kind of be betraying the trust the author and the listener have in telling this story

Starting point is 00:36:29 if I don't go through it. But there's places in this book that deeply emotionally, me and I left it on the mic. That's great. Because it served the story. People will say like, oh my God, I cried at the end. It's like, yeah, dude, me too. Listen to Eursay, the Audible and IHeart Audio Book Club on the IHart Radio app or wherever you get your podcasts.

Starting point is 00:36:51 I mean, I don't want to lose track of this other thing that we mentioned and then he sort of buried it in the happiness versus sadness discussion. But there is this multidimensional way of thinking about the words. And you've done your factor analysis to try to figure out what dimensions matter the most. And why don't you tell us what those dimensions are that matter the most? Yeah, I actually just wrote that down. So this has been, I think, I'm really excited about it. I mean, it's still in review, so we'll see what happens.

Starting point is 00:37:23 But we're pretty confident about all of this. So all right, so we had valence. And when we sort of saw that at the time in the literature that this, these were the dominant, This is the dominant axis. And certainly when you look at the data back then, and I think we're looking at data sets that had a thousand words with scores associated with them. So it's not a big set of words, right?

Starting point is 00:37:45 People's vocabularies are tens of thousands, you know, something like Twitter with all its misspellings is hundreds of years. Yeah. So, but you know, you want something on that order. And over time, what has happened, of course, there have been bigger studies done and done in slightly different ways. And so we've gotten, just as you might hope in science,

Starting point is 00:38:05 you know, more accurate, richer works. Back then, 12, 13 years ago, the main idea of what was going on was there was valence, which is this happy, sad, good, bad kind of access. And there were a couple others, though. One was about dominance, like, do you feel in control or not in control

Starting point is 00:38:25 when you kind of consider something? And there was another one which is activity. It's got various, you know, for it, but basically kind of activation. Is this exciting or boring? So there have been these other sort of secondary dimensions that people have had flowing around, and then sort of debates about which ones matter. Okay, so we've got this work from, we didn't do this, this study, but a couple of years ago it's worked by Muhammad in Canada. Again, online, many, many people doing these evaluations, and now we've got 20,000. Now it's 20,000. So there's a huge jump from, say, 1,000.

Starting point is 00:39:04 and work that we did, which got to 10,000. And they're mostly good kind of words, right? They don't have people's names or events, which can be a bit of an issue with some of these large sets of words. All right. So the idea was that people were going to evaluate these on valence and what was called arousal, which is the activation one in dominance, right? So you're given these three dimensions.

Starting point is 00:39:29 And it is tricky. How do you present that to people? So you kind of have to give them kind of class. of words at each end. Right. So they kind of know that, you know, the positive end of valances, you feel good, you feel happy, you feel maybe comforted, you know, it's a bit spread out. But that's fine.

Starting point is 00:39:46 Everyone's given that same, you know, those same instructions. But looking harder at this stuff and then, again, doing this kind of factor analysis, you can see you've got this kind of, now it's a three-dimensional space. Maybe the, maybe it doesn't come back like in the dimensions you've actually tried to impose. that you've tried to say to people, you know, we think these are the fundamental dimensions. That's good, but you can see like what they actually think. You know, maybe they correlated some of those. And that actually turns out to be the case.

Starting point is 00:40:15 And if you sort of rotate this football and kind of squeeze some of the axes and pull some of them apart, you get another shape. And it has these two main, well, we played around with it for a while, but it has these two main axes. And the one going across, if you like, horizontally, is powerful, power up weak. Right. So power over here is like for people will be success, triumph if you sort of go back to people. So it's kind of winning. And then the weak end of that is void and nothing. So it's not failure. It's just emptiness. So that's going across the page. And then up is pointing up and this is our choice, danger. So this is like a compass for basic meaning. And so danger is up and safety is down. And we call this, all right, we make up words, but uziometry, right? So uziio means, you know, I mean, we're taking it to mean essence. It's a Greek word, but it is where the word essence comes from. So O-U-S-I-A.

Starting point is 00:41:18 And we felt, I mean, it's fun to make up words, but it was also like, it's not semantics, it's not semiotics. It's, and we're not measuring meaning. We're measuring, and that somehow it's depicted, we don't want people to think that. We're measuring a central meaning, if you just still have everything. everything down. So we've tried this out with many different corpora, so things like the Sherlock Holmes novels and stories, short stories. Jane Austen's works, so they're sort of famous authors. And then a huge collection of fiction from Google with a sort of complicated thing, but it's 120 years. So that's everything sort of smushed together equally. Wikipedia, a snapshot of Wikipedia, which is a, you would think, just a different object.

Starting point is 00:42:03 Talk radio, so that's transcriptions of talk radio. So now we're going for spoken word. It's been turned into text, but that's a different. It's spontaneous. It's different. It includes everything from, you know, NPR to sort of shock-jok stuff, right? So it's a big grab bag. The New York Times as well, 20 years of the New York Times.

Starting point is 00:42:23 And so we look, so this is a. this type token distinction again, right? So that sort of first work where we found this danger, safe access and power weak access was looking at types. Like every word got one vote. Yeah. And so we kind of figure it out. And that's good, but it's just the substrate. And then you have to go and see what people actually, you know, what in these different venues and beyond, how do they use these? How often do they use these? Now I want to do this with all of my podcast episodes. I have transcripts for all of them. I want to know which ones are powerful or weak or dangerous or safe. Yeah. No, I mean, it's, I mean, we kept, and when I, those ones I just listed, you know,

Starting point is 00:43:02 we didn't do them all at once. They were sort of like, you'd kind of like corral the data and then kind of do the same analysis again. I remember every time thinking, will this be different? What, you know, what's going on? They're all a little bit different, of course, but all of them have what I'll call is a safety bias, that the predominance of words that people use are in this, in this lower half of this kind of this disc if you like and and it's words that are trend trend towards being safe you know at the bottom are things like um comfort uh if you want to go out to safe week you get the words like sofa and tortoise um and then you know safe and powerful of words like wisdom and happiness that's a real and that turns out that quadrant that's well

Starting point is 00:43:50 call the safe, powerful quadrant, it sort of lines up with positivity and happiness. And there's this much older work that we built on when we looked at large text, which came up with this idea of the Polyana principle, that in general interactions between people, and so this communication of all kinds, there are more positive aspects than negative ones. It's a bit surprising to people, I think, because it's easy to kind of bring to mind arguments or negativity on maybe online or the news is terrible, you know, these sorts of things. But if you think, you know, society exists, it can not exist, but it does exist and it does hold together from lots of little sort of positive interactions. And I, so this was work,

Starting point is 00:44:37 this is maybe six or seven years ago. What was, I mean, I didn't expect this was surprising. It sort of popped out that there are more positive words than negative words. That's just, And it's true across, we looked at 10 major languages, 24 corpora, you know, Russian, German, Korean, Indian. So we looked at a lot of different pieces there and it really kept coming out. So, and, you know, there is a sort of a story there. I mean, that language is our great social technology, right? We're excited about Snapchat or something. But, you know, really language is this an unbelievable thing that we have.

Starting point is 00:45:15 money is another one, I suppose, perhaps, you know, because we've somehow encoded belief into this abstract thing. It's pretty weird. Do I remember correctly, though, that in fictional stories in particular, there's more danger than you might expect, or, I mean, then you have an ordinary language, because obviously a story wants to be exciting somehow. It's a good question if it's more. So all of them have on average a positivity bias.

Starting point is 00:45:41 Okay. Now, there are parts where they dip below. to this negative side of things. But, you know, if you look at music lyrics, one of the first things we looked at the way, and it kind of told us that we were getting somewhere. The rankings, so at the bottom is heavy metal. Right.

Starting point is 00:45:57 The bottom of what? Of the graph? Yeah, well, this is sort of like ranking, like taking genre, this is something where we were, and you did ask about genres for fiction, but this is actually something where we did have genres. And this is on this, this is on the happiness one. Okay.

Starting point is 00:46:12 So this is, yeah. And at the top is gospel and soul. soul, right? So it kind of made, the ordering look pretty good for this very rudimentary instrument we'd made. But even, you know, heavy metal, it was still above neutral on average, right? It's still good to know. Even though, yeah, it's just still above. So if you look at a, you know, maybe Harry Potter or something like when things go bad, it does dip into this, this negative thing, which is pretty hard because you've got to use a lot of negative words because on average, the bulk of words are over in the positive side of things. Or at least, you know, there's a skew towards positive.

Starting point is 00:46:45 So the generalization of that now is that, in fact, it's a safety bias. It's not just, it's not really positive. It's that we're using more safer words. And dangerous words, you know, they're incredibly important. They describe all of these things that can go wrong. We just don't use them as much. And when we use them, of course, they're incredibly meaningful. But so happiness is basically, yes, is safety plus power.

Starting point is 00:47:14 And one of the things, the other thing that I thought was really fascinating was different stories that you looked at have character, you can associate characters in the narrative with, you know, along this dangerous versus safe and powerful versus weak axis. And I guess Harry Potter had like all sorts of characters. Like, you know, they're dangerous ones, weak ones, etc. Whereas in Game of Thrones, almost everyone's powerful and a lot of them are very dangerous. It was more like a clash of extremes in that, in that. way. So that work, again, it's like completely thrilling to me. This is just incredibly excited because this comes from a completely different data set. So this is a sort of an online thing. Again, not something we did, but it went back to giving people characters from stories. And there are a lot of TV shows and movies, but there's also Pride and Prejudice, right? So there's some books are in there. And zooming out and presenting, it's about 100, it might be 200, but 150 of these semantic differential. So sort of going way back in time and away and giving people, you know,

Starting point is 00:48:19 so it's for characters. So there's country, city, you know, that kind of rich, poor. There are things that may be a little more, you know, clearly as a scientific people. So we were able to sort of start again with a really rich set of semantic differential. And I think there are about 800 characters that we looked at over 90 different, I'll call them story verses, right? There's Buffy the Vampire Slayer, X-Files. You said Game of Thrones. It's a rested development is in there.

Starting point is 00:48:46 It's really a big spread. So I think there's something for everyone, right? You might not know 80% of them, but there will be some that you could look at. So this is a completely different data set. And doing this analysis again, and, you know, turning things around and kind of rotating spaces

Starting point is 00:49:04 and not really doing anything funny where we're saying we're desperate to find this power danger thing. It really popped out for free. So this is something that's, just, you know, very sort of supportive of what we've done in this other space. And there is a third dimension, and I should mention that one, because it's, in general, it's about what we called structure. So structured to unstructured.

Starting point is 00:49:27 So a rock, you know, has a stronger structure level, cardinal, bureaucracy, boss, right? These are these are considered more structured, but clown and comical and tickle. These are words that go out, confetti, they're considered. unstructured. And so for characters, it's playfulness. It's much more about playfulness. So someone like Robin Hood, right, has a playful measure on them or molder from the X-Files is playful. Scully is not playful. Not a lot of playfulness in Game of Thrones. Pretty much all of them are in the dangerous powerful quadrant, which is the dominant. This is like dangerous winning, basically. you know, things can go wrong for you.

Starting point is 00:50:15 Except for, I'm going to get his name. Samuel. Tarley. If you know that. Tarley, yes. So he's in, he's down in the kind of the angel character. So Jane Bennett is there from Pride and Prudgeon. These are people who are, you know, they're more towards the safe axes.

Starting point is 00:50:37 They're still somewhat powerful, but they're more in the safe. So these are just really, really good people. that's who you find down the bottom. If you go around safe into the kind of weak quadrant, then you get people who tend to, you know, they're not bad people, but they tend to get run over. And out on the weak side, you get Michael Scott from the office. Homer Simpson is out there.

Starting point is 00:51:00 And then I wanted to say that if you go further up, you get, and this is where Joffrey is from Game of Thrones. There aren't many from Game of Thrones in this. What's the dangerous weak quadrant? Yeah, okay. And that's their chaos agent. They're the chaos agents. Love bread, bake goods and pasta, but not the way they make you feel.

Starting point is 00:51:17 What if I told you there are macro-friendly options that don't taste like sawdust and sadness? Satisfying sandwiches, fully loaded bagels, noodles noodles that can stand up to your favorite chunky sauces, all delicious. Craveworthy and smart. Each serving of Hero Bread has up to 19 grams of protein and 32 grams of fiber and just zero to five grams net carbs and zero grams sugar. Hero Bread bakes with heart-healthy olive oil and delivers this soft, fluffy, flavorful, You love. Breakfast burritos, smear-loaded bagels, real mac and cheese. Hero bread bakes, loaves, bagels, and tortillas that don't taste or feel like cardboard. Noodles that don't fall apart in hearty sauces. Plus, limited edition small batch bakes, like the 2 grams net carb hero croissant or 1 gram net carb hero cheddar biscuit, handmade in a Sonoma-based French bakery.

Starting point is 00:52:02 Shop now on hero.co. Use code iHeart for 10% off. That's hero.co. Per serving, not a low-calorie foods and products contain alu. See nutrition info on Hero.com. over sodium and sugar content. Hey, everyone, it's Cal Penn. I'm the host of Earsay, the Audible and I-Heart Audio Book Club. This week on the podcast, I am sitting down with Ray Porter,

Starting point is 00:52:26 the narrator of Andy Weir's audiobook Project Hail Mary, massive sci-fi adventure about survival and science, and what happens when you wake up alone very far from Earth? I really had to make a decision because I caught myself

Starting point is 00:52:42 getting that frog in my throat and starting to get teary as I'm narrating some of these sections. And it's like, okay, yo, yeah, yo, is this indulgent? And I really thought about it. I was like, no, at this point, it would kind of be betraying the trust the author and the listener have in telling this story if I don't go through it. But there's places in this book that deeply emotionally affected me and I left it on the mic. That's great. Because it served the story.

Starting point is 00:53:08 People will say like, oh my God, I cried at the end. It's like, yeah, dude, me too. Listen to Earsay, the Audible and IHeart Audio Book Club on the IHeart Radio app or wherever you get your podcasts. And again, I guess this might be future research, but you have this time series of how the valence of the story itself evolves page by page. And now you're saying there's a different set of analysis with sort of the distribution of characters or distribution of whatever, events and so forth. And how, if you just gave me that, like if you didn't tell me the plot, right, or the characters or the setting or whatever, how much could I learn, how much I infer about the story just by thinking about, you know, both how it evolved over time and what kinds of characters it involved. Do we know that yet? I mean, we are really trying to do that. And I think it's remarkable. So I sort of think of character as the shortcut to story, right? So what do we do with stories, right?

Starting point is 00:54:12 A lot of them are about prediction. They're about telling us how the world works. Proverbs do this. Stories that we listen to. These are ways that your life can go or maybe other people's lives can. We're trying to make sense of the world. And there are certain, you know, we tend to have stories wrapped around individuals, which I think is interesting, you know, because we want to be in them.

Starting point is 00:54:32 So it's hard for us to tell stories about systems. And that's why, yeah, I mean, when it comes to complex systems, all these sort of pharma that scientists work on, it's really hard, right? Because people want to anthropomorphize everything. They absolutely do. And I understand that drive, but it's hard. It's hard for us to tell those stories. So, but I think one of the things, so, you know, stories are incredibly important.

Starting point is 00:54:57 It's sort of what I'm trying to say there. But we also can shortcut them by just saying, oh, here are what these characters like, here are archetypes. And we sort of know what will happen if you say, here are these three people and here are there, we can kind of try to predict what will happen before. So I think they're like little kind of wind up toys, right? So in our brains, we will try to simulate,

Starting point is 00:55:20 we'll run the dynamical system of these characters interacting. It's very natural. We want to do it. We want to predict. You know, to a fault, obviously. So what we're trying to do now with this is this is tough, But you want to get this sort of danger power profile around a character and how it might evolve through a story as well.

Starting point is 00:55:44 There's the temporal network of which characters are interacting with each other. We should be able to get that out and with the environments, right? And you could imagine doing this for Star Wars or Lord of the Rings or something like that or, you know, pride and prejudice, any of these pieces. So what can we kind of trace that through? And it might be pretty rough. You know, we divide books into thirds or something. But then we could, you know, do 100,000 stories and get out one of the big patterns.

Starting point is 00:56:16 Breaking in and seeing this kind of this two-dimensional space has been, you know, very helpful in a lot of ways. I mean, I think it's really what it is. Another space we've looked at, perhaps just to start with, is Twitter, because we've worked on that a lot. But looking at at least what was expressed on Twitter for the January 6th. the attack on the capital, what you see there is, you know, just taking all the tweets and scoring them,

Starting point is 00:56:42 is, you know, measures, it's sort of these measures of energy, like high energy that I sort of mentioned before. And happiness kind of goes down. But really what you see on this kind of compass of essential meaning is that it really points straight to danger.

Starting point is 00:57:00 It actually goes straight to danger, right? So, sorry, upwards danger. Which is, you know, is kind of high energy plus badness in a way,

Starting point is 00:57:08 to use these kind of other frameworks. And so in that respect, you'd see it, you know, it goes down as being a sad thing on our hedonometer, but that's just a projection onto that axis, right? It's a shadow of the real direction, which was pure danger. That's very interesting, especially because one of the questions that I was going to ask was, if you're looking at happiness versus sadness on Twitter,

Starting point is 00:57:37 that's obviously very interesting thing. But when I actually looked at the data, you know, everyone's happy on holidays. That's a clear winner, right? Christmas, or at least you put out your happy tweets on Christmas. And then everyone's sad when there's a terrorist attack or a shooting. Okay. But other events, like a presidential election,

Starting point is 00:57:55 are more of a mixed bag. And I'm wondering if there are the simplest possible thing I can think of is just a measure of the variance, right? Like, is it something where a whole bunch of people are happy and a whole bunch of people are sad at an election result? Or is that something that you've quantified? Yeah, we have it. We just haven't put it on the site.

Starting point is 00:58:15 And I think that's, you know, you're exactly right. So how much, you know, to what degree are people in unison about something? And for the extreme things, just in some ways they have to be, right? Just for those scores to be so high and so low. But, you know, so you're quite right. So there is a predictability to the big spikes in positivity, and there are just annual holidays, right? And so people are using the, you know, the expressions of that time, you know, even happy Valentine's Day. Now, if you look at the words being used and compare them maybe to some other dates, you can see that there's really some negativity in there as well.

Starting point is 00:58:50 It's being swamped by this kind of positive, right? So Valentine's Day will have lonely, right? But it's being kind of what Christmas might have that as well. So it's a, it's not, so you want to be a little careful. It's just, it's not like everyone is, you know, doing that. They're right there. And you can see it for days of the week. So Saturday is generally the most positive day.

Starting point is 00:59:11 Tuesday is generally the most negative day. But Saturday, you know, it has movies, weddings, you know, like there's lots of positive things that might happen on Saturday. But it also has bored and hangover. You know, there are some, you know, not all. It's not all great for everyone on Saturday. And there's a daily rhythm too, right? Yeah, there's a strong daily rhythm, which I kind of,

Starting point is 00:59:33 I think it's actually in science mixing. I have this line, which is the daily unraveling, raveling of the human mind. So we kind of, I know sleep remains a mystery, but I think we need to be rebooted because I think we just become emotionally unstable by the end of the day. You know, that's, I'm being funny.

Starting point is 00:59:51 But you see swearing goes up through the day, cursing goes up through the day, and you know things yeah and like you sort of say the variance goes up as well the emotional variance goes up through the day so people start off fairly tight like things are okay but it's not emotionally varied as well right yeah and then the wheels kind of come off as the collectively i mean if you're if one is being a little bit skeptical here is it possible that you know i might think a lot of people are happy at seven p.m because they're enjoying dinner or movie or whatever but those people are not on Twitter, right? How much of a bias do we have by the fact that Twitter is our data stream here?

Starting point is 01:00:31 Yeah, no, it's a weird selection. But I will say that more generally, if you zoom out, it does match up with Gallup polls. Okay. Right. Which is kind of wild, right? And we've done some, we have some other instruments. There's one we call the lexico calerimina, which takes in phrases from Twitter and assigns them as to whether they're kind of foodstuffs. or about exercise and then assigns calories to them. And so it's at the state level for the U.S. But the rankings you get out of that, because you sort of get these calories in calories out, which we're not, you know, we're not sort of,

Starting point is 01:01:07 it's a very rough, silly thing. But it matches, it lines up with obesity rates. So you can tell which states have higher obesity rates from Twitter, is what you're saying. Yeah. And you can look at what they're talking about, right? So, you know, Colorado does come out, number one. Vermont, at least in this time, we looked at was sort of three, I think.

Starting point is 01:01:28 But I was overly fond of talking about bacon, which sort of pushed down the term. I would have thought, you know, Coloradoans are pretty healthy, outdoorsy people. I don't know. Yeah, so there's lots of skiing and running and biking. Yeah, those words are there. A lot of bacon and donuts. Well, that whole thing is quite amazing to look at because the ground state, though, if you like, for every, what's being expressed in terms of food and exercise is pizza and watching television. because we have a lot of activities and some of them include like lying down as an activity.

Starting point is 01:01:58 But watching television is one. So the states differ from that, but their baseline is pretty uniform in terms of what is being expressed. And of course, that's advertising. That's all sorts of things. It's a bit of a melange of inputs. And this brings back something that you mentioned right at the beginning. And I thought it was actually, I mean, maybe I had not thought this way before, but I really certainly should. have, which is that we very often talk about the traveling of ideas and sharing and contagion of ideas or notions or opinions through social networks and other networks, right?

Starting point is 01:02:37 An idea might be, you know, universal health care or the right to bear arms. But stories can also travel through these information networks and narratives. And is that either something you've done or is it a target to sort of tease out? which stories, which narratives are being shared and how useful they are? Because I'm certainly willing to believe that a good, compelling narrative wins every day over a set of facts no matter how true they are. Yeah, I have this thing where I say something like never bring statistics to a story file. Right. I mean, it's not going to work out for you. So you've got to, you should bring the numbers, but you've got to bring stories as well. Right. I mean, it's just how we kind of

Starting point is 01:03:20 operate. And of course, people in politics, people, you know, in, in, in, in, in, you should, you know, Religion understanders have been telling, figuring out how to tell stories about things for a long time. So it's absolutely a long-term ambition to do that. It's very hard. But what, you know, we have this sort of framing of story wrangle. Like, how do you get out the stories that people are expressing around an event as it happens and there may be long-term? So say, the Parkland shootings, shooting happened. It's a terrible event, just to pull one out of the many. How do you sort of track the stories that emanate from that? And by that time in history, I was pretty sure that there would be a lot of conspiracy theory type things. And sure enough, like I remember going on YouTube the next day and just searching for Parkland. And 18 of the top 20 hits, which was sort of presented as 20, were conspiracy theory things about, you know, that was all faked. It's false flag and so on.

Starting point is 01:04:19 So how do you measure that in real time? I mean, this is an enormous, enormous goal. It's very hard. And then, so maybe there's a blossoming of stories after some event because it's just confusion. Which ones are then fighting against each other? Which ones start to win? I have notions of stories, you know, kind of having hierarchies to them. You want to be able to tell your story simply.

Starting point is 01:04:45 And that's where slogans, I think, have this great effect. and they might not be tethered to some bigger story. Certainly religion's working that way. You want to be able to sort of say things quickly. You know, it's this hierarchy of narratives that you want to be able to deliver. No, it's an incredibly difficult problem, but I think that framing is, well, I want to say the right one, but I think it's a very powerful one to be thinking about what are the stories people are telling. And how much are they reducing stories to sort of characterize?

Starting point is 01:05:17 characterization. So, for example, Pizza Gate, that story is pretty out there, right? I mean, it's pretty out there. There's a basement in this comet, ping pong place, and there are terrible things happening to children, and there's a cabal, and all this sort of stuff. That's a little hard to grasp, but I think the access in there is really through character. And so Hillary Clinton, for example, being characterized as this evil person to use folklore kind of things as a witch, then you say this story about her and you're like, sure, because

Starting point is 01:05:51 she's the devil. Or if, you know, someone else is sort of framed as a godlike character, they can do no wrong and you say some story about them that, you know, suggests they've done a bad thing. It gets, it's deflected, it's washed away. You know, what are the defense mechanisms built into stories is a really big part? You know, how do stories become hermetically sealed or story versus, if you like? Yeah, I mean, I know that politicians are very focused on the idea that they want to paint their opponents and themselves in certain ways. Like there's certain kinds of criticisms you can make that just don't stick to certain people because they don't fit the narrative in exactly that way.

Starting point is 01:06:30 And finding exactly that kind of weak point, how do you paint someone as a bad character in a way that is consistent with what people already think about them is one of the secrets to political success? One thing I've thought about with this danger of power kind of framework is is a sort of flipping between saying your opponent is dangerous and weak. Yes. And it doesn't seem to matter, right? I mean, we sort of know that in politics, you can kind of say lots of things.

Starting point is 01:06:59 And if it doesn't stick, you know, you just keep moving on. But they're really trying, you know, in our framework, orthogonal attacks, right? They sort of literally orthogonal attacks. So you're trying to say this person is, in a sense, quite powerful and dangerous. Or, you know, maybe the next day you want to say, you know, they're feck. and weak. And that's a, that's a, sort of a kind of a funny attack. So, really incredibly hard problems. And look, there's a, there's a huge danger to this as well, right? I mean, of course, being able to manipulate stories and kind of measure them and do all these

Starting point is 01:07:36 sorts of things and see what the weak points are in a system. Disinformation people work on this all the time. I think one of the, something I keep reflecting on is, you know, for scientists and journalists. So my wife's a journalist. And I always sort of think of journalists as scientists with a deadline. So we're trying to figure things out and tell the truth about something and kind of explain things, right?

Starting point is 01:08:00 You know, broadly is a big piece. That's a huge battle for us because the possible stories you can come up with that are not true, but are favorable to a viewpoint or a culture or whatever. There's just an incredible number of things. to maybe what's adjacent to the true story. You can really explore some stuff and find the stories that will spread faster, that will tack on to people's existing beliefs.

Starting point is 01:08:30 So that just is going to be a challenge. It's always been a challenge. It's just in sort of a time of so much information, right, so much availability, so much ability to curate and kind of create story verses that are misleading online, that you. can be taken into. It's, it's, it's, we, we, we, we have to work really hard on this. This is hard.

Starting point is 01:08:52 So I went to the web page for the heat denominator. I encourage everyone to check it out. And it's searchable, right? You can look for all sorts of wonderful things. And so just to normalize my own expectations, I search for the frequency of the word quantum because this is something I'm interested in and doesn't have a lot of, you know, high rates of appearance and news stories, but occasionally. And what you find is probably what I should have expected ahead of time, which is that there's

Starting point is 01:09:16 sort of baseline, which is pretty normal, and there are spikes. And the spikes are pretty extremely noticeable. But I don't know what the spikes are from, right? I mean, clearly there was from a story about a quantum computer or something like that. Maybe my book came out. I don't know, but that would be great. So how much of that sort of back engineering, reverse engineering, can you do? When you see something weird happening in the data, oftentimes in the big stories, you just know what it is, but are there objective procedures for figuring out why the words are shifting in these different ways on different days? Yeah, so a number of pieces here.

Starting point is 01:09:57 This has been an eternally interesting, difficult problem. What happened? Yeah. Like what happened, right? And I remember early on, you know, maybe 15, 20 years ago, looking through Google, trying to find out what happened on a particular day. Right. It was kind of hard. And then Wikipedia emerges.

Starting point is 01:10:14 and if you would, of course, has entries for every date of history, sort of, I suppose, now. But there's certainly in the modern times. And they're sort of weird lists. It's just a weird list of things that happened in the world. You know, there was a Star Trek convention. There was, you know, this war started. I mean, it's a real mixture. It was a concert.

Starting point is 01:10:34 So with some of what we've got there, story wrangling, for example, if you click on a point, it will take you to Twitter and search Twitter for that. date and you will sort of show you which tweets are being amplified on that day intentionally. So tweets get deleted. It's a bit of a problem. So maybe it doesn't hold up. But that's something where we,

Starting point is 01:10:56 and you know, it depends on the sort of restrictions. Yeah, we have 10% of all tweets going back to 2008, but we can't, you know, sort of share them and put them out wholesale. So we've tried to do something there by pushing it back into the,

Starting point is 01:11:11 you know, the actual structure itself of Twitter. Google Books, for example, you can't really, it's harder to do that, right? It's harder to go back and search for that Google trends. You kind of want to figure out, like, why is this thing being talked about? So at least I think with Twitter, we have that to some extent. We have another big body of work, which really is connected to this, trying to figure out exactly this, what happened on a particular day or in a particular week.

Starting point is 01:11:37 And we did it around Trump. Just, I mean, it's president. It matters. it's a very good test case and I think it certainly 2015 and 2020 and kind of still now really what I called

Starting point is 01:11:54 a turbulent time story story turbulence has been really high right so excuse me that turnover of stories has just been kind of incredible and I remember sort of thinking in 2016 17 like especially

Starting point is 01:12:08 2017 I think can you remember what happened the last two weeks can you say you know and it was a challenge right there could be massive events like space force or something or you know there's always something sort of yeah there's always something and i you know look the world's a rich place but it was an effort to sort of study that so we have this uh thing which is kind of computational timeline reconstruction and it works through twitter but it could work through anything so we could do it through say a state has archives for example going back in time or any kind of news source, you know, maybe the New York Times.

Starting point is 01:12:46 What are the sort of narratively dominant terms, like words and pairs of words, that kind of pop up? And these kind of act then like keywords into bigger stories. So they're not telling you, say, you know, buying Greenland. It doesn't tell you like the whole story of Greenland. But it would, you know, this is referencing Trump, but it would sort of pointed out. But, you know, early on, what we're able to see there early on was that, there was a lot of turbulence in that first year, 2017.

Starting point is 01:13:17 There was just a lot of changeover. There were also just natural disasters, Hurricane Maria and so on. So these things came on. But there was North Korea, you know, sort of provocations, and then Charlottesville happened the next week. And it's hard to remember the orderings of these things.

Starting point is 01:13:34 So we have a timeline that, you know, that kind of comes out computationally. And what you see in 2020 is just this really, just sudden change into coronavirus, as we called COVID coronavirus initially for many months, just being the dominant story every day for months. And we have a measure we call it chronopathy. You can see that time functionally slowed down because there was just not so much turnover in stories. It was always the same dominant story.

Starting point is 01:14:03 George Floyd's murder explodes the narrative. And then, but that becomes stuck again too because that becomes a durable story. And then, of course, we get to the election and things. So you can quantify the impression we all have that sort of time froze once the pandemic hit. Right. And people said this. They said, you know, people say this anecdotally all the time. You know, yesterday felt like a week.

Starting point is 01:14:26 Yeah. I saw one tweet was like, I'm going to write an autobiography of my last, the last 10 years of my life. It's called 2020. You know, like, so, I don't mean, it's a very physics-ish sort of thing, like time dilation and so on. But this was a memory, you know, at the point. population scale, did things seem to really, you know, maybe it wouldn't have. Maybe, you know, there is this just sort of turnover. It doesn't really matter.

Starting point is 01:14:49 But in fact, yeah, 14 days in April 2020 is kind of, you would have the same sort of turnover in two days in 2017. And it's a weird, it's a weird thing. I know I had David Eagleman on the podcast a while ago, but like there's this weird mismatch between simultaneously, one says nothing is happening and time seems to last forever, right? Even though, so the rate of which time passes is sort of inverse to the rapid rapidity with which things happen. An exciting movie seems to go by very quickly. Yeah, yeah, no, that can be, and it depends how much you're recording, you know, in your own mind,

Starting point is 01:15:25 right? There are studies of how people, yeah, so when something, it's usually around something sort of dire or terrible happening, you know, an accident, you have this seeming slow motion replay of it in your head, and it's because you were really kind of writing, down the memory. I mean, that's what I understand, right? You were really recording it and kind of you have it in fine detail, but there's a lot that goes on in life, right? And we know we miss most of it, the sounds and the things. You know, they just sort of pour past us. There's too much to measure for one person. So our brains are pretty, well, problematically good, perhaps, at just ignoring things that don't fit our little narrative right now. I mean, there's certainly a lot that

Starting point is 01:16:05 you've covered that is going on here. I do, it's wonderful to have this conversation because I get the impression that a lot of the excitement what you're doing is still ahead. Like we've just started picking some of the low-hanging fruit. But I guess one final question, which you did allude to earlier, but we can take these ideas and turn them around and put them to work, right? I mean, maybe either in artificial intelligence or in political campaigns or in writing a screenplay, you know, like, can we figure out, can we distill what would be the perfect narrative or the perfect time structure of valence or something like that? Are people trying to

Starting point is 01:16:42 operationalize these ideas in that sense? Yeah. So there's a lot of work over the years and you can maybe make a fair amount of money of saying you can predict which things will take off, right, because of your analytic tool. So it's, you know, it's, I think, I think what we can do is say, look, here's the shape of your story. You know, here are the kind of tropes you've used and so on. and this is how it compares to others around. And, you know, maybe give people that sort of a diagnostic like that. Now, in terms of making something take off for sure, well, so this is the problem, right? Reality is socially constructed.

Starting point is 01:17:20 We have older work on, yeah, we have older work on fame, and, of course, many people have kind of come to this in different ways that show if you have kind of basically you run the world over and over us for cultural, social things, it doesn't always, there's a lot of variability, right? Harry Potter doesn't win in every universe. It certainly didn't win for the first 12 or 13 editors who said no. How could they not know? How could they not know? They're professionals, right?

Starting point is 01:17:49 But how could they not know that this would be this giant thing? And that actually indicates how much fortune and luck. The fact that there's an enormous runaway success in something, right? The world is full of these, right? Where the number one thing is so much bigger than the second one and the third one. right, these heavy-tail distributions, that it's indicative of actually, what we'll tell is, you know, our simple story for that is the Mona Lisa is fantastic, for example. Like, it is intrinsically amazing.

Starting point is 01:18:20 You know, if you look at it, you will be transported and it's because of this and this and this. But we just leave out completely the social construction aspect. It's because, you know, it took 400 years to get to that idea that it was the greatest painting in the world. And there's a whole sort of set of reasons for why it became increasingly famous that are not intrinsic, you know, stories around it. But it's a good example of something where, you know, you can't really kind of, I mean, I guess that's the point of. You try to make things as good as you can and you want them, you want to make them spreadable. That's important that people want to tell other people about it. I think that's the great thing.

Starting point is 01:18:59 And of course, that works for disinformation as well, right? So what will spread in the social wild, right? This is the great problem of advertising, right? This is sort of probably made up line, but half the money is wasted in advertising. We just don't know which half, right? And so, you know, it's sort of true. Like very unexpected things happen and take off. And how did you not know that people?

Starting point is 01:19:22 Well, there's a lot of social construction that goes on. But so it wouldn't be anything that would guarantee the future of, you know, of some social phenomena, but it would serve as a, I think, can serve as a diagnostic. I worry about the negative aspects, you know, I mean, you know, but I think we have, like all of science here, we have to know that we have to, we have to know the things, right, so that we can start to build defense systems. And I think AI, for example, or what we'll call AI or sort of certainly the modern work with language and all of these kind of crazy instruments. You know, they've gotten a little, they've gotten way ahead of us, right?

Starting point is 01:20:05 We're trying to make decisions about, you know, in juries or parole or something like that or presenting things that turn out to be deeply racist or whatever. And, you know, we've got this, we've got ourselves way beyond describe and explain into the sort of create category of science. And, you know, we need, I think, I mean, I think it's turning around. People are looking at the corpora and so on that they're. built some of these systems out of. And, you know, I'm really relieved to see that happen because I think there was a

Starting point is 01:20:39 wild time there. And we got ourselves, you know, like Facebook's algorithm, right? Which thing spread? Right. You have dials that you turn to make certain things spread or not spread. You can change the social contagion there. And that's, you know, it's, yes, there's money on one side, but there's also just, you know, does society hold together on, on another side?

Starting point is 01:21:00 And I think that's, I think that's, I think that's, I think that's, I guess there's also a feedback question, right? I mean, there's this David Lodge novel I read from the 80s, and he mentions very, very early efforts in digital humanities where you would digitize someone's book and figure out what words they used more often than the typical English language. And this author was shown, you know, that you use the word, you know, moist or whatever, way more than average.

Starting point is 01:21:29 And once he found out those words, he could. write anymore, like, because he was too self-conscious about doing it. And I wonder if we figure out too much about, you know, what the shapes of these stories are and everything, how that's going to affect how we tell them ourselves. Yeah, there's some peril there, I suppose. Yeah, scientists, classic science move just look too deep. It's like trying to understand comedy and destroy everything, right? So, yeah. Explaining the joke. Thanks, thanks, scientists. No, I think I I would hope it would just get, I mean, people are incredibly creative. You know, people are incredibly creative.

Starting point is 01:22:04 We find new ways to tell stories. We're in a time where we have so many stories in the past that we kind of play with them and so on. It's, you know, it's, I think it's, I don't, I don't think it will stop all of that. It could, I can produce some stuff that's not very good, I think. That maybe is the problem. Well, you try to build formulas too much and so on. So that, that could be a slight, more dangerous. Fair enough. Well, all right. I will just repeat. Thanks, scientists. I like that as a motto.

Starting point is 01:22:34 And Peter Dodds, thanks very much for being on the Mindscape podcast. Sean, it's been a great pleasure. Thank you.

Sean Carroll's Mindscape: Science, Society, Philosophy, Culture, Arts, and Ideas - 181 | Peter Dodds on Quantifying the Shape of Stories

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.