Deep Questions with Cal Newport - Ep. 244: Thoughts on ChatGPT

Starting point is 00:00:10 I'm Cal Newport, and this is Deep Questions, the show about living and working deeply in an increasingly distracted world. I'm here on my Deep Work HQ, joined once again by my producer, Jesse. So, Jesse, you may have noticed that we have been receiving a lot of emails in the last few months about chat GPT and related AI technologies. And our listeners want to know my thoughts on this, right? I'm a computer scientist. I've thought about the intersection of technology and society, and I've been silent about it. Well, I can reveal the reason why I've been silent about it is that I've been working on a big article for the New Yorker about exactly this technology, how it works in its implications for the world. And my general rule is when I'm writing an article, I don't talk about that subject publicly until the article is done.

Starting point is 00:01:08 I mean, that's basic journalistic practice, but actually, Jesse, I have a, I've never told this story. but that rule was really ingrained to me when I was in college. So when I was coming up as a young writer, you know, I got started pretty early, wrote my first book in college. Yeah. I was commissioned to write something for the New York Times. I don't remember exactly what it was. Maybe an op-ed, something to do with college students or something like this. And I had an early blog at that time.

Starting point is 00:01:35 And I wrote something on the blog like, hey, this exciting. I'm going to write an article from New York Times. So maybe like I put the short email on there And it was like, yeah, we'd love to have you write the piece or something. And that editor went ballistic. Really? Oh, yeah. Cancel the piece.

Starting point is 00:01:50 Cancel the piece? Chewed me out. Now, this was early internet, right? I mean, this was 2004, probably. So I don't know. Maybe it was more, it felt more like a breach thing. But ever since then, if I'm writing an article. Did you ever talk to that editor again?

Starting point is 00:02:05 No. I ended up writing a lot for the times, but not really until 2012. Was that you're going to be your first big splash? That would be my first big splash. Were you like depressed for a couple days? A little shook up.

Starting point is 00:02:19 And then starting with so good they can't ignore you. Going forward, I had a really good relationship with the Times, especially through digital minimalism. I've written tons of articles for him, but there's a lesson learned. So I've written now the day that we're recording this podcast, April 13th,

Starting point is 00:02:33 my new article, The New Yorker has been published. So I am free. The gag order has been lifted. and we can get into it when it comes to chat GPT. In fact, I'll even load the article up here on the screen. For those who are watching, you will see this on the screen. If you're not watching, you can watch at YouTube.com slash Calnewport Media. Look for episode 244.

Starting point is 00:02:57 You can also find that at the deeplife.com episode 244. Here, Jesse, is the long-awaited article. The title is, what kind of mind does chat GPT have? The subhead is large language model. seem startingly intelligent, but what's really happening under the hood. And it's a big long article. So it's good. I'm excited to read it.

Starting point is 00:03:18 Yeah, so we can talk chat CPT. I mean, you probably haven't been following it too closely, just based on our conversation. Some people are really in the weeds and some people don't want to know. I'm guessing you're not in the weeds on chat GPT, but it could be wrong. No, I'm not in the weeds at all. I listened to like a few of the hard fork episodes on it. That was about it. And what was the tone of those episodes?

Starting point is 00:03:40 They were given some examples of what it was when it first came out. I'd probably listen to them like six weeks ago. Yeah. And then, yeah, that was kind of it. Well, so I'll give a quick primer then before we get into the guts of what I want to talk about today. So chat GPT is a chat bot. You can sign up for an account at OpenAI. And it's a web interface.

Starting point is 00:04:02 You type in questions and chat GPT or prompts or request and chat GPT response, types text back. like you're talking to someone over Slack or Instant Messenger. So this was released in November, late November of last year. And almost immediately people began circulating online screenshots of particularly impressive interactions or particularly funny interactions that they had with chat GPT. Here's one of the first ones to go viral. I talk about this one in my article. So here's a tweet of a screenshot that went along.

Starting point is 00:04:33 This was from a software developer named Thomas Pasek. and he asked Chatchipt the following, write a biblical verse in the style of the King James Bible, explaining how to remove a peanut butter sandwich from a VCR. Chatchapetee rose to the challenge and wrote a response that begins, and it came to pass that a man was troubled by a peanut butter sandwich, for it had been placed within his VCR, and he knew not how to remove it, and he cried out to the Lord saying,

Starting point is 00:05:02 oh, Lord, how can I remove this sandwich from my VCR, for it is stuck fast and will not bud. And the response goes on. Here's another early viral example of chat GPT's prowess. This was a someone named Riley Goodside who asked chat GPT to write a sign failed scene in which Jerry needs to learn the bubble sort algorithm. And chat GPT once again rose to the occasion. a not a properly formatted script

Starting point is 00:05:36 but has some of the aspects of it it opens in a monk's cafe it says Jerry is sitting at the counter with George Jerry sighs and says I can't believe I have to learn to bubble sort algorithm for my computer science class George laughs bubble sort that's the most basic sorting algorithm there is even a monkey could do it audience laughs Jerry

Starting point is 00:05:56 yeah well I'm not a monkey I'm a comedian and then the screen the scene goes on from there All right, so this is the type of thing chat GP can do, these impressively perceptive answers to pretty esoteric request. Now, if you go back and actually watch the media cycle around chat GPT, which I have to say is driven very strongly by Twitter, I think the fact that anyone can sign up for an account and that screenshots of your interactions can be easily embedded into Twitter really helped get the hype cycle around this technology spinning much more furiously than it has. has for past artificial intelligence innovations. Anyways, if you go back and look at this media cycle, it took a week or two before the tone shifted from exuberance and humor.

Starting point is 00:06:45 Like if you look at this example I just gave about Seinfeld, the tweet says, actually not that one I meant of the VCR, the tweet says, I'm sorry, I simply cannot be cynical about technology that can accomplish this. So it went from this sort of exuberance and happiness to something that was a little bit more distressing. There's a couple examples I want to bring up here. Here is an article from NBC News. The headline is,

Starting point is 00:07:11 ChatGPT passes MBA exam given by a Wartent professor. Uh-oh. That got people worried. Here is another article from around this period from Time magazine. Headline, he used AI to publish a children's book in a weekend.

Starting point is 00:07:28 Artists are not happy about it. It details a product design management. who used ChatCTPT to write all the texts of a book, which he then self-published on Amazon and started selling. A bit of a stunt, but it implied certain types of future scenarios in which this technology was taking away creative work that really made people unsettled. As we get closer to the current period, I would say the tone shifted since the new year, in particular coming into March and April, the tone shifted towards one of alarm, not just about the focused economic. impacts that are possible with this type of technology, but some of the bigger societal, if not civilization level impacts of these type of technologies. I would say one article that really helped set this tone was this now somewhat infamous

Starting point is 00:08:19 Kevin Roos piece from the New York Times that is titled A Conversation with Bing's Chatbock left me deeply unsettled. Bing released a chatbot after ChatTPT. based on a very similar underlying technology. Kevin Ruse was, I guess, beta testing or using this new tool and it fell into this really sort of dark conversation with the chatbot, where among other things, the chatbot tried to convince Kevin to divorce his wife. The chatbot revealed that she had a sort of hidden double identity. I think that identity was called Venom, which was a very sort of dark personality. So Kevin set a tone of, ooh, I'm a little bit worried, and it escalated from there. In late March, we get this op-ed in the New York Times.

Starting point is 00:09:08 This is March 24th, written by some prominent authors. Yuval Harari, Tristan Harris, and Azza Roskin. And they really, in this article, are starting to point out potential existential threats of these AIs. They are arguing strongly for we need to take a break and step back from developing these AIs before it becomes too late. Here's our last paragraph. We have summoned an alien intelligence. We don't know much about it, except that it is extremely powerful and offers us bedazzling gifts, but could also hack the foundations of our civilization.

Starting point is 00:09:44 We call upon world leaders to respond to this moment at the level of challenge it presents. The first step is to buy us time to upgrade our 19th century institutions for an AI world and to learn to master AI before it masters us. a few days after this op-ed, an open letter circulated signed by many prominent individuals demanding exactly this type of pause on AI research. Okay, so this is the setup. Chat CPT is released. Everyone's using it.

Starting point is 00:10:15 Everyone's posting stuff on Twitter. Everyone's having fun. Then people start to get worried about, wait a second, what if we use it for X? What if you use it for Y? And then people got downright unsettled. Wait a second. What if we've unleashed an alien intelligence and we have to worry about it mastering us? We have to stop this before it's too.

Starting point is 00:10:31 So it really is a phenomenal arc. And this all unfolded in about five months. So what I want to do is try to shed some clarity on the situation. The theme of my New Yorker piece, and I'm going to load it on the screen and actually read to you the main opening paragraph here. The theme of my New Yorker piece is we need to understand this technology. We cannot just keep treating it like a black box. and then just imagining what these black boxes might do and then freak ourselves out about these stories we tell ourselves

Starting point is 00:11:09 about things that maybe these black boxes could do. This is too important for us to just trust or imagine or make up or guess at how these things function. So here's the nut graph of my New Yorker piece. What kinds of new minds are being released into our world? The response to chat GPT and to the other chatbots that have followed in its wake has often suggested that they are powerful, sophisticated, imaginative, and possibly even dangerous. But is that really true? If we treat these new artificial intelligence tools as mysterious black boxes, it's impossible to say. Only by taking the time to investigate how this technology actually works,

Starting point is 00:11:47 from its high-level concepts down to its basic digital wiring, can we understand what we're dealing with. We send messages into the electronic void and receive surprising replies, but what exactly is writing back? That is the deep question I want to address today. How does chat GPT work? And how worried should we be about it? And I don't think we can answer that second question until we answer the first. So that's what we're going to do.

Starting point is 00:12:15 We're going to take a deep dive on the basic ideas behind how a chat bot like chat GPT does what it does. We'll then use that to draw some more confident conclusions about how worried we should be. I then have a group of questions from you about AI that I've been holding on to as I've been working on. on this article, so we'll do some AI questions, and then to end the show will shift gears and focus on something interesting. So an unrelated, interesting story that was arrived in my inbox.

Starting point is 00:12:40 All right, so let's get into it. I want to get into how this actually works. I drew some pictures, Jesse, be warned, I am not a talented graphic designer. That's not true. I am not much of an artist. So Jesse watched me hand-drawing some of these earlier on the tablet. I got to say, This ain't exactly Shyat Day, the famous ad agency level work here. But you know what? It's going to get the job done.

Starting point is 00:13:04 So I have five ideas here I'm going to go through. And my goal is to implant into you the high-level ideas that explain how a computer program can possibly answer with such sophisticated nuance, these weird questions we're asking it, how it can do the Bible verse about the VCR, how it can do a Seinfeld scene with Bubble Sort. And we're going to do this at the high level.

Starting point is 00:13:26 We're going to essentially create a hypothetical program from scratch that is able to solve this. And then at the very end, they'll talk about how these big ideas I'm going to, these five ideas I'm going to present. We'll talk about how that's actually implemented on real computers. But we'll do that real fast. That's kind of a red herring. The neural networks and transformer blocks and multi-headed attention. We'll get there, but we'll do that very fast. It's the big conceptual ideas that I care about.

Starting point is 00:13:49 All right. Idea number one about how these type of programs work is, word guessing. Now I got to warn you, this is very visual. Everything I'm talking about now is on the screen. So if you're a listener, I really would recommend going to YouTube.com slash Cal Newport Media and going to episode 244. And if you don't like YouTube, go to the deeplife.com and go to episode 244 because this is very visual what I'm going to do here.

Starting point is 00:14:15 All right. So as we see on the screen here, for idea number one is word guessing. And I have a green box on the screen that represents the LLM or large language model that would underpin a chat bot like chat GPT. So what happens is if you put a incomplete bit of text into this box, the example here is I have the partial sentence fragment, the quick brown fox jumped. The whole goal of this large language model is to spit out what single next word should

Starting point is 00:14:48 follow. So in this case, if we give it the quick brown fox jumped as input, But in our example, the language model has spit out over. So this is the word I'm guessing should come next. So then what we would do is add over. So add the word to our sentence. So now our sentence reads, the quick brown fox jumped over. So we've added the output.

Starting point is 00:15:13 So we've expanded our sentence by a single word. We run that into the large language model. And it spits out a guess for what the next word should be. So in this case is the. And then we would now expand our sentence. the quick brown fox jumped over the, we would put that as input into the model, it would

Starting point is 00:15:29 spit out the next word. This approach, which is known as auto-regressive text generation, is what the models underneath all of these new generation chatbots like chatGPT actually use. They guess one word at a time,

Starting point is 00:15:45 that word is added to the text, and the newly expanded text is put through the model to get the next word. So it just generates one word at a time. So if you type in a request to something like ChatGBT, GBT, BT, that request plus a special symbol that means, okay, this is the end of the request and where the answer begins, that is input. And it'll spit out the first word of its response. It'll then pass the request plus the first word of its response into the model to get the second word of the response. It'll then add that on and pass the request plus the first two words of its response into the model to get the third word.

Starting point is 00:16:17 So this is generating text one word at a time. and it just slowly grows what's generating. There's no recurrency in here. It doesn't remember anything about the last word. It generated its definition doesn't change. The green box my diagram never changes once it's trained. Text goes in. It spits out a guess for the next word to add to what's being generated.

Starting point is 00:16:42 All right. Idea number two, relevant word matching. So how does it figure out? How do these large language models figure out what word to spit out next? Well, at its core, what's really happening here, and I'm simplifying, but at its core, what's really happening here is the model is just looking at the most relevant words from the input. It is then going to match those relevant words to actual text that it's been given. We call these source text in my article.

Starting point is 00:17:14 So examples of real text. It can match the relevant words to where they show up in real text and say, what follows these relevant words in real text? And that's how it figures out what it wants to output. So in this example, the most relevant words are just the most recent words. So if the input into our box is the quick brown fox jumped over, perhaps the model is only going to look at the last three words. Fox jumped over.

Starting point is 00:17:39 And it has this big collection over here to the side of real text that real humans wrote, all these examples. And it's going to look in there and it's going to say, okay, have we seen something like Fox jumped over show up in one of our input text. And okay, here's an input text that says, as the old saying goes, the quick brown fox jumped over the lazy brown dog. So, look, Fox jumped over. Here it is. We found it in one of the source texts.

Starting point is 00:18:01 What came after the words Fox jumped over? The. Great. Let's make the what we guess. Now, of course, in a real industrial strength, large language model, the relevant words aren't just necessarily the most recent words. There's a whole complicated system called self-attention. in which the model will actually learn which words to emphasize is the most relevant words.

Starting point is 00:18:25 But that's too complicated for this discussion. The key thing is, is just looking at some words from the text, effectively finding similar words in real text that it was provided and saying what happened in those real texts. And that's what it figures out, how it figures out what they're produced next. All right, this brings us to idea number three,

Starting point is 00:18:43 which is voting. So the way I just presented it before, it was, you know, hey, just start looking through your source text until you find the relevant words, see what follows, output it. That's not actually what happens. We want to be a little bit more probabilistic. So what, I would say, a closer way of describing what happens is we can imagine that our large language model is going to look for every instance, every instance of the relevant words that we're looking for. And it's going to see what follows those instances and keep track of it. what are all the different words that follow, in this example, Fox jumped over. And every time it finds an example of Fox jumped over, it says what word follows next,

Starting point is 00:19:25 let's give a vote for that word. And so if the same word follows in most of the examples, it's going to get most of the votes. Now, I'm using votes here sort of as a metaphor. What we're really doing here is trying to build a normalized probability distribution. So in the end, what we're going to get, what the large language model is going to produce, is for every possible next word, it is going to produce a probability. What is the probability that this should be the next word that follows?

Starting point is 00:19:53 But again, you can just think about this as votes. Which word received the most votes? Which word received the second most votes? How many votes did this word receive compared to that word? And we're just going to normalize those is really what's happening. But you just think about it as votes. So in this example, we see the phrase, the quick brown fox jumped over the lazy brown dogs. That shows up in a bunch of different sources.

Starting point is 00:20:15 So the word the gets a lot of votes. So it has sort of a high percentage here. But maybe there's similar phrases. Look at this example here. The cat jumped over a surprised owner. Cat jumped over is not the same as Fox jumped over because Cat is different than Fox. But in this voting scheme, we can say, you know what? Cat jumped over is similar to what we're looking for, which is Fox jumped over.

Starting point is 00:20:39 So what word follows that? the word A, well, we'll give that a small vote. And so now what we're able to do is not only find every instance of the word, the relevant words that we're looking for and generate votes for what follows, we can also start generating weaker votes for similar phrases. And in the end, we just get this giant collection of every possible word and a pile of votes for each. And what the system will do is now randomly select a word.

Starting point is 00:21:07 That's why I have a picture that Jesse would admit. expertly drawn picture of a three-dimensional dice. That's pretty good. Honest question. Did you know that was a dice before? Oh, yeah. Okay. For sure.

Starting point is 00:21:19 Look at that, guys. 3D rendering. Yep. So for those who are listing, I have a picture of a dice to indicate randomness. It'll then randomly select which word to come next, and it'll weigh that selection based on the votes. So if, in this case,

Starting point is 00:21:31 the has most of the votes, almost certainly that's the word it's going to choose to output next. But look, A has some votes. So it's possible that it'll select A. It's just not as likely. the word Apple has zero votes, zero percent probability, because it never shows up after the phrase

Starting point is 00:21:44 anything similar to Fox jumped over. No phrase similar to that is ever followed by Apple, so there's no chance it'll select it. And so on. And actually, in these systems, the output is a vote or a percentage like this for every possible, they call them tokens,

Starting point is 00:21:58 but word it could follow next. Here's a quiz, Jesse. And the model on which chat GPT is based, how many different words do you think it knows? So, in other words, when it when it has to generate okay here's a pile of votes for every possible next word how many words or punctuations are a billion no it's 50,000 oh 50,000 so it has a vocabulary of 50,000 it's not all words but basically it knows like tens of thousands of words and how big is like the

Starting point is 00:22:28 biggest dictionary that's a good question yeah I don't think it's as it probably there's probably a lot of esoteric words. Yeah. Because the thing is, it has some, like, vectors describing all these words, follow it along throughout the system. So it really does affect the size, how big. So it's like, you want to have a big enough vocabulary to talk about a lot of things, but not so big that it really inflates the system.

Starting point is 00:22:50 Right. So voting is just my euphemism for probability, but this is what's happening. So now we have a bunch of source text, and we imagine that for our relevant words, we're just finding all the places in these source texts where relevant words, show up or similar relevant words show up and see what follows it. And in all cases, generate votes for what follows it, use those votes to select what comes next. All right, so this brings us to idea four. Idea 1 through 3 can generate very believable text. This is well known in natural language processing. Systems that do more or less what I just

Starting point is 00:23:28 described, if you give it enough source text and have it look at a big enough window of relevant words and then just have it spit out word by word in the way we just described that auto-regressive approach. This will spit out very believable text. It's actually not even that hard to implement. In my New Yorker article, I point towards a simple Python program I found online. It was a couple hundred lines of code that used Mary Shelley's Frankenstein as its input text. It looked at the last four words in the sentence being generated. That's what it uses the relevant words. And as I showed in the article, this thing generated very good Gothic text. Right. So that's how you generate believable text with a program. And notice,

Starting point is 00:24:15 nothing we've done so far has anything to do with understanding the concepts the program is talking about. All of the intelligence, the grammar, the subtleties, all of that that we see so far is just being extracted from the human text that were pushed as input and then remitement. mixed and matched and copied and manipulated into the output. But the program is just looking for words, gathering votes, selecting outputting blindly again and again and again. The program is actually simple. The intelligence you see in an answer is all coming at this point from the input text themselves. All right, but we've only solved half the problem.

Starting point is 00:24:53 If we want a chat bot, we can't just have our program generate believable text. The text have to actually answer the question being asked by the user. So how do we aim this automatic text generation mechanism towards specific types of answers that match what the user is asking? Well, this brings in the notion of feature detection, which is the fourth out of the five total ideas I want to go over today.

Starting point is 00:25:20 So what happens with feature detection is a response, so we have a request, and perhaps the answer that follows a request that is being input into our large language model. So as shown here, a request, that says write instructions for removing a peanut butter sandwich from a VCR. Then I have a bunch of colons, and then I have the beginning of a response. The first step is two, right?

Starting point is 00:25:44 Because everything gets pushed into the model. You get the whole original question, and you get everything the model has said so far in its answer, right? Word by word, we're going to grow the answer. But as we grow this answer, we want the full input, including the original question, input into our models. That's what I'm showing here. feature detection is going to look at this text and pattern match out features that it thinks are relevant for what text the model should be producing. So these yellow underlines here instructions in VCR. So maybe that's one feature it points out.

Starting point is 00:26:18 It extracts from this text. These are supposed to be instructions about a VCR. And maybe this orange underlined, another feature says the peanut butter sandwich is involved. And so now the model has extracted two features. These are VCR instructions we're supposed to be producing, and they're supposed to involve a peanut butter sandwich. The way we then take advantage, and by we, I mean the model, the way we take advantage of those features is that we have what I call in my article rules.

Starting point is 00:26:47 And I have to say AI people don't like me using the word rules because it has another meaning in the context of expert decision systems. But just for our own colloquial purposes, we can call them rules, that extract each rule think of it as an instruction for extracting

Starting point is 00:27:01 features like a pattern matching instruction and then a set of guidelines for how to change the voting strategy

Starting point is 00:27:08 based on those particular features. So here's what I mean. Maybe there's a rule that looks for things like instructions in VCR

Starting point is 00:27:17 and figures out okay, we're supposed to be doing instructions about a VCR. And its guidelines are then when looking to

Starting point is 00:27:24 match the relevant words And in this example, I'm saying the relevant words are step is to. So like just the end of the answer here. When looking to match step is two, when we find those relevant words, step is two showing up in a source text that is about VCR instructions, give extra strength to those votes. So here I have on the screen, maybe one of the input text was VCR repair instructions, and it says when removing a jam tape, the first step is to open the tape slot.

Starting point is 00:27:55 So we have Step is to Open. So Open is a candidate for the next word to output here. Because this source document matches the feature of VCR instructions, our rule here that's triggered might say, hey, let's make our vote for Open really strong. We know it's grammatically correct because it follows Step Is 2. But we think it also has a good chance of being semantically correct because it comes from a source that matches

Starting point is 00:28:25 is the type of things we're supposed to be writing about. So let's make ourselves more likely to do this. Now, think about having now a huge number of these rules for many, many different types of things that people could ask about it. And for all of these different things, someone might ask your chat program about, peanut butter sandwiches, VCR repair, Seinfeld scripts, the bubble sort algorithm. For anything that someone might ask your chat program about, you have some rule that talks about what to look at in the source text to figure out it's relevant and very specific

Starting point is 00:28:56 guidelines about how should we then change our votes for words that match source text, that match these properties, these complicated rules. If we have enough of these rules, then we can start to generate text that's not only natural soundly but actually seems to reply to or match what is being requested by the user. No, I think the reason why people have a hard time grasping this step is they imagine how many rules them or them and a team of people could come up with. And they say, I could come up with a couple dozen. Maybe if I worked with a team for a couple of years, we could come up with like a thousand good rules. But these rules are complicated.

Starting point is 00:29:37 Even a rule as simple as how do we know they're asking about VCR instructions and how do we figure out if a given text were given is a VCR instruction text? I don't know. I'd have to really think about that and look at a lot of examples. and maybe if we worked really hard, we could produce a few hundred, maybe a thousand of these rules. And that's not going to be nearly enough. That's not going to cover nearly enough scenarios

Starting point is 00:29:55 for all of the topics that the more than one million users who've signed up for chat chpT, for example, all the topics they could ask about. It turns out that the number of rules you really need to be as adept as chat pbt just blows out of proportion any scale, any human scale we can think of.

Starting point is 00:30:15 I did a little bit of back of envelope math, for my New Yorker article, if you took all of the parameters that define GPT3, which is the large language model that chat GPT then refined and is based on. So the parameters we can think of as the things they actually change, actually trained,

Starting point is 00:30:32 so this is really like the description of all of its rules. If we just wrote out all of the numbers that define the GPT3, we would fill over 1.5 million average length books. So the number of rules you would have to have, if we were writing them out, would fill a large university library full of rules. That scale is so big we have a really hard time imagining it. And that's why when we start to see, oh my goodness, this thing can answer almost anything I send to it. It can answer almost any question I ask of it.

Starting point is 00:31:08 We think there must be some adaptable intelligence in there that's just learning about things, trying to understand and interact. with us because we couldn't imagine just having enough wrote rules to handle every topic that we could ask. But there is a lot of rules. There's 1.5 million books full of rules inside this chat GPT. And so you have to wrap your mind around that scale. And then you have to imagine that not only

Starting point is 00:31:31 there's that many rules, but we can apply them in all sorts of combinations. VCR instructions, but also about a peanut butter sandwich, also in the style of King James Bible, stack those three rules. And we get that first example that we saw earlier on. All right. So then the final idea is how in the world are we going to come up with all those rules?

Starting point is 00:31:49 1.5 million books full of rules. How are we going to do that? And this is where self-training enters the picture. These language models train themselves. Here's the very basic way this works. Imagine we have this 1.5 million books full of rules, and we start by just putting nonsense in every book. Nonsense rules, whatever they are. So the system doesn't do anything useful right now, but at least we have.

Starting point is 00:32:16 have a starting point. And now we tell the system, go train yourself. And to help you train yourself, we're going to give you a lot of real text, text written by real humans. So when I say a lot, I mean a lot. The model on which chat GPT is based, for example, was given the results of crawling the public web for over 12 years. So a large percentage of anything ever written on the web over a decade was just part

Starting point is 00:32:39 of the data that was given the chat GPT to train itself. And what the program does is. it takes real text, little passages of real text out of this massive, preposterously large dataset. And it will use these passages one by one to make its rules better. So here's the example I have on the screen here. Let's say one of these many, many, many, many, many sample texts we gave Chat, CPT was Hamlet. And the program says, let's just grab some text from Hamlet.

Starting point is 00:33:10 So let's say we're in Act 3 where we have the famous monologue, to be or not to be, that is the question. What the program will do is just grab some of that text. So let's say it grabs 2B or not the B. And then it's going to lop off the last word. So in this case, it lops off the word B. And it feeds what remains into the model. So when you lop off B here, you're left with 2B or not 2.

Starting point is 00:33:34 It says, great, let's feed that into our model. We have 1.5 million books full of rules. They're all nonsense because we're early in the training, but we'll go through each of those books and see which rules apply and let them modify our voting strategy. and we'll get this big vector of votes, then we'll randomly choose a word. And let's say in this case the word is dog,

Starting point is 00:33:49 because it's not going to be a good word because the rules are really bad at first, but it'll spit out some words. Let's say it spits out dog. Now, the good news is for the program, because it took this phrase from a real source, it knows what the next word is supposed to be. So on the screen here in orange,

Starting point is 00:34:05 I'm showing it knows that B is what is supposed to follow to be or not to. So it can compare B to what it actually spit out. So the program spit out dog. It compares it to the right. answer. The right answer is B. And here's the magic. It goes back and says, let me nudge my rules. There's a formal mathematical process it does to do this. But let me just go in there and just kind of tweak these rules. Not so the program accurately spits out B, but so it spits out something that is

Starting point is 00:34:32 minutely more appropriate than dog. Something that is just slightly better than the output it gave. So based on this one example, we've changed the rules a little bit. so that our output was just a teeny bit better. And it just repeats this. Again and again and again. Hundreds of thousands of passages from Hamlet. And then from all the different Shakespeare works. And then on everything ever written in Wikipedia.

Starting point is 00:34:57 And then on almost everything ever published on the web. Bulletin board entries, sports websites, archived articles from old magazine websites. Just sentences, sentences, lop off a word, see what it spits out, compare it to the right answer, nudge the rules. Take a new sentence, lop off the last words. stick in your model, see what it spits out, compare it to the real one, nudge the rules. And it does that again and again and again hundreds of billions of times. There's one estimate I found online that said training chat GPT on a single processor

Starting point is 00:35:28 would take over 350 years of compute time. And the only way that they could actually train on so much data so long was to have many, many processors working in parallel, spending well over a million dollars, I'm sure, worth a compute time just to get this training done. still probably took weeks, if not months to actually complete that process. But here's the leap of faith I want you to make after this final idea. If you do this training, this simple training process on enough passages drawn from enough source text covering enough different types of topics from VCR instructions to

Starting point is 00:36:01 Seinfeld scripts, these rules through all of these nudging, these 1.5 million books worth of rules, will eventually become really, really smart. and it'll eventually be way more comprehensive and nuanced than any one team of humans could ever produce. And they're going to recognize that this is a Bible verse. You want VCR instructions here. And bubble sort is an algorithm. And this chapter from this textbook talks about bubble sort. And these are scripts.

Starting point is 00:36:25 And this is a script from Seinfeld. And actually, this part of the script for Seinfeld is a joke. So if we're in the middle of writing a joke in our output, then we want to really upvote words that are from jokes within Seinfeld scripts. All of these things we can imagine will be covered in these rulebooks. And I think the reason why we have a hard time imagining it being true is just because the scale is so preposterously large. We think about us filling up a book. We think about us coming up with two dozen rules. We have a hard time wrapping our mind around just the immensity of 1.5 million books worth of rules, trained on 350 years worth of compute time.

Starting point is 00:36:58 We just can't easily comprehend that scale. But it is so large that when you send what you think is this very clever request to chat GPT, it's like, oh, this rule, that rule, this rule, this rule, boom, they apply, modifier votes, let's go. And it spits out something that amazes you. So those are the big ideas behind how chat GPT works. Now, I know all of my fellow computer scientist out there with a background in artificial intelligence are probably yelling at your podcast headphones right now. See, well, that's not quite how it works, though. It's not, it doesn't search for every word from the source text.

Starting point is 00:37:34 And it doesn't have rules individually like that. it's instead in a much more complicated architecture. And this is all true. It's all true. I mean, the way that these models are actually architected or in something called a transformer block architecture. GPT3, for example, has 96 transformer blocks arranged in layers one after another. Within each of these transformer blocks is a multi-headed self-attention layer

Starting point is 00:37:58 that identifies what are the relevant words that this transformer block should care about. It then passes that on into a feed-forward neural network. is this neural networks that actually encode inside their weights connecting their artificial neurons that actually encode in a sort of condensed, jumbled, mixed up manner, more or less a strategy I just described. So the feature detection that's built into the weights of these neural networks. The connection between certain features being identified, combined with certain relevant words, combined with vote strengths for what words should come next, all that is trained into these networks during the training.

Starting point is 00:38:34 So all the statistics and everything is trained into these as well. But in the end, what you get is basically a jumbled, mixed up version of what I just explained. I sat down with some large language model experts when I was working on this article and said, let me just make sure I have this right. These high-level five ideas, that's more or less what's being implemented in the artificial neural networks within the transformer block architecture. And they said, yeah, that's what's happening. Again, it's mixed up, but that's what's happening.

Starting point is 00:39:02 And so when you train the actual language model, you're not only training it to identify these features, you're baking in the statistics from the books and what happens with these. All that's getting baked into the big model itself. That's why these things are so large. That's why it takes 175 billion numbers to define all the rules for, let's say, GPT3.

Starting point is 00:39:20 But those five ideas I just gave you, that's more or less what's happening. And so this is what you have to believe is that with enough rules trained enough, what I just defined is going to generate really believable impressive text. That's what's actually happening. word guessing one word at a time with enough rules to modify these votes and enough source text to draw from, you produce really believable text.

Starting point is 00:39:44 All right. So if we know this, let us now briefly return to the second part of our deep question. How worried should we be? My opinion is once we have identified how these things actually work, our fear and concern is greatly tempered. So let's start with summarizing based on what I just said. What is it that these models like ChatGPT can actually do? Here's what they can actually do. They can respond to a question in arbitrary combination of known styles talking about arbitrary combination of known subjects. They can write about arbitrary numbers of known styles

Starting point is 00:40:27 talking about arbitrary combinations of known subjects. where known means it has seen enough of those things, enough of the style or enough writing about the topic in its training. That's what I can do. So say, write about this and this in this style. Bubble sort, Seinfeld, in a script. And it can do that. And it can produce passable text if it's seen enough of those examples.

Starting point is 00:40:52 And that's also all it can do. So let's start with the pragmatic question of, is this going to take over our economy? And then we'll end with the bigger existential question. And is this an alien intelligence that's going to, you know, convert us into matrix batteries? So we'll start with the, is that capability I just described going to severely undermine the economy? And I don't think it is. I think where people get concerned about these chat GPT-type bots in the economy is they mistake the fluency with which it can combine styles and subjects with a adaptable fluid intelligence.

Starting point is 00:41:23 Well, if it can do that, why can't it do other parts of my job? Why can't it handle my inbox for me? Why can it build the computer program I need? You imagine that you need a human-like flexible intelligence to produce those type of text that you see, and flexible human-like intelligence can do lots of things that are in our job. But it's not the case. There is no flexible human-like intelligence in there. There's just the ability to produce passable text with arbitrary combination of known styles on arbitrary combinations of known subjects.

Starting point is 00:41:48 If we look at what most knowledge workers, for example, do in their job, that capability is not that useful. A lot of what knowledge workers do is not writing text. It is, for example, interacting with people or reading and synthesizing information. When knowledge workers do write, more often than not, the writing is incredibly narrow and bespoke. It is specific to the particular circumstances of who they work for, their job, and their history with their job, their history with the people they work for. I mentioned in my New Yorker piece that as I was writing the conclusion, that earlier that same day, I had to co-author an email to examine. the right person in our dean's office about a subtle request about how the hiring, practically hiring process occurs at Georgetown, carefully couched, because I wasn't sure if this

Starting point is 00:42:38 was the right person, carefully couched in language about, I'm not sure if you're the right person for this, but here's why I think. And we talked about, this is why I'm asking you about this. We had this conversation before. Nothing in GPT, chat GPT's broad training could have helped it accomplish that narrow task on my behalf. And that's most of the writing that knowledge workers actually do. And even when we have relatively generic writing or coding or production of text that we need a system to do, we run into the problem that chat GPD and similar chatbots are often wrong. Because again, they're just trying to make good guesses for words based on the styles and

Starting point is 00:43:19 subjects you asked it about. These models have no actual model of the thing it's writing about. So they have no way of checking. Does this make sense? Is this actually right? It just produces stuff in the style. Like what is an answer supposed to more or less sound like if it's about this subject in this style? This is so pervasive that the developer bulletin board stack overflow had to put out a new rule that says no answers from chat GPT can be used on this bulletin board.

Starting point is 00:43:49 because what was happening is chat GPT would be happy to generate answers to your programmer questions. It sounded perfectly convincing. But as the moderator of the Stack Overflow board clarified, more often than not, they were also incorrect. Because ChatGPT doesn't know what a correct program is. It just knows I'm spitting out code, and based on other code I've seen and the features, this next command makes sense. And most of the commands make sense, but it doesn't know what's sort of. sorting actually means or that it's, there's a one-off issue here, or that equality isn't quite, you need a quality, not just less things or whatever, right?

Starting point is 00:44:29 Because it doesn't know sorting. It just says, given the stuff I've spit out so far and the features I detected from what you asked me and all this code I've looked at, here's a believable next thing to spit out. So it would spit out really believable programs that often didn't work. So we would assume most employers are not going to outsource jobs to an unrepentant, fabulous. All right. So is it going to be not useful at all in the work?

Starting point is 00:44:49 place, no, it will be useful. There'll be very bespoke things I think language models can do. It's particularly useful what we've found in the last few months, where these technologies seems to be particularly useful is when you can give it text. They can do this too. You can give it text and say, rewrite this in this style or elaborate this. It's good at that, and that's useful. So if you're a doctor and you're typing in notes for electronic medical records, it might be nice that you can type them in sloppily. And a model like GPT4, chat GPT, you might be able to take it. that and then transform those ideas into better English. It's the type of thing it can do.

Starting point is 00:45:25 It can do other things. It can gather information for us and collate it in a way like a smart Google search. This is what Microsoft is doing when it's integrating this technology into its Bing search engine. It's a Google Plus. So, I mean, Google's already pretty smart, but you could have it do a little bit more actions. I mean, so there's going to be uses for this. But it is not going to come in and sweep away whole swaths of the economy. All right, let's get to the final deeper question here.

Starting point is 00:45:52 Is this some sort of alien intelligence? Absolutely not. Once you understand the architecture, as I just defined it, there is no possible way that these large language model-based programs can ever do anything that even approximate self-awareness consciousness or something we would have to be concerned about. There is a completely static definition for these programs once they're trained. the underlying parameters of GPT3, once you train it up, for example, do not change as you start running requests through it.

Starting point is 00:46:24 There is no malleable memory. It's the exact same rules. The only thing that changes is the input you give it. It goes all the way through these layers in a simple feed-forward architecture and spits out a next word. And when you run it through again with a slightly longer request, it's the exact same layers, spits out another word. You cannot have anything that approaches consciousness or self-awareness without malleable memory. To be alive, by definition, you have to be able to have a ongoing, updated model of

Starting point is 00:46:51 yourself in the world around you. There's no such thing as a static entity where nothing can change. There's no memory that changes, nothing in it changes that you would consider to be alive. So, no, this models is not the right type of AI technology that could ever become self-aware. There's other models in the AI universe that could be. where you actually have notions of maintaining and updating models of learning, of thinking about yourself interacting with the world, having incentives, having multiple actions you can take. You can build systems that in theory down the line could be self-aware. Large language model won't be it. Architecturally, it's impossible.

Starting point is 00:47:34 All right, so that's where we are. We've created this really cool large language model. It's better than the ones that came before. It's really good at talking to people, so it's easy to use, and you can share all these fun tweets about it. This general technology, one way or the other, will be integrated more and more into our working lives, but it's going to have the impact, in my opinion, more like Google had once that got really good, which was a big impact. I mean, you can ask Google all these questions, how to define words.

Starting point is 00:48:01 It was very useful. It really helped people, but it didn't make whole industries disappear. And I think that's where we're going to be with these large language models. They can produce text on arbitrary combinations of known subjects, using an arbitrary combination of known styles where known means they've seen it sufficient number of times in their training. This is not Hal from 2001. This is not an alien intelligence that is going to, as was warned in that New York Times op-ed, deploy sophisticated propaganda to take over our political elections and create a one-world government.

Starting point is 00:48:33 This is not going to get rid of programming as a profession and writing as a profession. It is cool, but it is not, in my sense, in my opinion, an existential threat. It's transformative in the world of AI. Probably will not be in the immediate future transformative in your day-to-day life. All right. So, Jesse, there's my professor sermon for the day. You don't want to get me started on computer science lectures because I could fall into my old habits, my professorial habits and really.

Starting point is 00:49:07 bore you. So how many rules will they be in five years? Like, will it double? I don't know how much bigger it can get. Yeah, it's a good question. So they jump from GPT2 to GPT3. So GPT2 had some of the largest number of parameters before GPT3 came out. And it had 17 billion or something like this. And GPT3 has 170 billion. I was talking to an expert at MIT about this. The issue about making this too much larger is they're already sort of giving it all of the text that exist. And so at some point, you're not going to get back bigger returns. So we said there's two issues with this. If you make your networks too small, they're not complicated enough to learn enough rules to be useful.

Starting point is 00:49:49 But if you make them too large, you're wasting a lot of space. You're just going to have a lot of redundancy. I mean, it can only learn what it sees in its data set. So if 175 billion parameters is well fit to this massive training data that we use for these chatbots, then just increasing the size of the network's not going to change much. You would have to have a correspondingly larger and richer training data set to give it. And I don't know how much more, at least for this very particular problem of producing text, I don't know how much more richer or larger for a data set we could give it.

Starting point is 00:50:23 And I actually think what the direction happening now is how do we make these things smaller again? GPT3 is too big to be practical. A 175 billion parameters can't fit in the memory of a single GPU. You probably need five different specialized pieces of hardware just to generate a single word. That's not practical. That means I can't do that on my computer. That means if everyone at my office is constantly making requests a GPT3 as part of their work, we're going to have this huge computing bill.

Starting point is 00:50:50 So actually a lot of the effort is in how do we make these things smaller? Just focus on the examples that are relevant to what these people actually need. We want them to be small. We want it eventually to have models that can fit in a phone and still do useful things. So we, GPT3, I think was, and that's what all these other ones are based off of, that was open AI saying what happens if we make these things much bigger. And now we're going to go back to make them smaller. And if you actually read the original GPT3 paper, their goal with making it 10 times bigger was not that it was going to have in a particular domain 10 times better answers. They wanted to have one model that could do well in many unrelated tasks.

Starting point is 00:51:28 And if you read the paper, they say, look, here's a bunch of different. tasks for which we already have these large language models to do well. But they're customized to these tasks. They can only do that one task. And what they were proud about with GPT3, if you read the original paper, is this one model can do well on all 10 of these tasks. It's not that it was actually doing much better than the state of the art in any one of these things. It was just that, look, you don't need necessarily the hand-trained a model for each task. If you make it big enough, it can handle all the different tasks.

Starting point is 00:51:58 So it wasn't getting 10 times larger to not make GPT3. 10 times better at any particular task. In fact, in most tasks, it's as good as the best, but not much better. It was the flexibility and the broadness. But that's not, that's good to see. And it's cool for these web demos. But going forward, the name of the game, I think, is going to go back to, actually, we need to make these things smaller so that we can not have to use an absurd amount of

Starting point is 00:52:24 computational power just to figure out that dog should follow. The quick brown fox jumped over the lazy brown, right? we need to maybe be a little bit more efficient. But anyways, I'm not particularly... It's a cool technology. But I don't know. I think once you open this, it's just not as worrisome. Yeah.

Starting point is 00:52:43 But it's a black box, you can imagine anything. I definitely like that Yuval Harari op-ed was definitely influenced by Nick Bostrom's superintelligence, which we talked about on the show a few months ago. Yeah. Where he just starts speculating. He's a philosopher, not a computer scientist. Boston just starts speculating.

Starting point is 00:52:58 what if it got this smart, what could it do? Well, what if it got this smart, what could it do? Just thinking through scenarios about, and he was like, well, if it got smart, it can make itself smarter, and then it can make itself even smarter, and it become a super intelligence. And then they have all these scenarios about, well, if we had a super intelligent thing, it could take over all world politics. Because it'd be so smart and understand us so well that it could have the perfect propaganda. And now the bot could get us all to, like, do whatever it wanted us to do.

Starting point is 00:53:23 It's all just philosophical speculation. You open up these boxes and you see 175 billion numbers. being multiplied by GPUs doing 1.5 million books worth of pattern detection vote rules to generate a probability vector so it can select a word. All right, well, there's my computer science sermon.

Starting point is 00:53:43 I have a few questions I want to get to from you, my listeners, that are about artificial intelligence. First, I want to mention one of the sponsors that makes this nonsense possible, and we're talking about our friends at Zoc Doc. Zoc Doc is the only free app that lets you find and book doctors who are patient reviewed, take your insurance,

Starting point is 00:54:01 and are available when you need them to treat almost every condition under the sun. So if you need a doctor, instead of just saying, I'll ask a friend or I'll look it up in the yellow pages, you can instead go to Zoc Doc, your Zoc Doc app, and it'll show you doctors who are nearby

Starting point is 00:54:19 that take your insurance, you can read reviews. And then once you sign up with these doctors, they'll often use Zoc Doc to make it easier for you to set up appointments, get reminders about appointments, send in paperwork. Both my dentist and my primary care physician use Zoc Doc, and I find it really useful because all of my interactions with them happen to that interface. So go to Zocdoch.com slash deep and download the Zocdoc app for free, then find a book a top-rated doctor today, many who are available within 24 hours.

Starting point is 00:54:48 That's ZOC, doc, doc,com slash deep, Zocdoc.com slash deep. the show is also sponsored by Better Help, as we often talk about when it comes to the different buckets relevant to cultivating a deep life, the bucket of contemplation is in there. Having an active and healthy life of the mind is critical to a life that is deep in many different ways. It is easy, however, due to various events in life to fall into an area where your relationship to your mind get strained. Maybe you find yourself overwhelmed with anxious thoughts or ruminations or depressive moments where

Starting point is 00:55:31 you feel lack of affect. Whatever it is, it is easy for your mind to get out of kilter. Well, just like if your knee started hurting, you would go to an orthopedist. If your mind started hurting, and by the way, there's really loud construction sound going on. This is the restaurant below us is being made. It's better be worth it. But returning to better help. Let me add this example.

Starting point is 00:55:57 If you find yourself becoming increasingly enraged because of restaurant construction noise that occurs during your podcast, among the other things that could happen to affect your mental health, you need a therapist. Orthopedist will fix your bum knee. Therapist helps make sure that your cognitive life gets healthy, gets resilient, gets back on track. The problem is it's hard to find therapist. If you live in a big city, all the ones near you might be booked or they might be really expensive. this is where Better Help enters the scene. If you're thinking about starting therapy, BetterHelp is a great way to give it a try

Starting point is 00:56:31 because it's entirely online. It's designed to be convenient, flexible, and suited to your schedule. You just fill out a brief questionnaire to get matched with a licensed therapist and you can switch therapist anytime for no additional charge. So discover your potential with BetterHelp. Visit BetterHelp.com slash deep questions,

Starting point is 00:56:51 one word today to get 10% off your first month. That's BetterHelphelp.com slash deep questions. I hope this restaurant's good, Jesse, after all the disruption. Yeah. They put up the signage. I don't know if you saw that.

Starting point is 00:57:08 I didn't see the signage. I've heard the music. Motocat. That's the name? Yeah. It's a Tacoma Park reference. Okay. Yeah.

Starting point is 00:57:16 It's an old character from Tacoma Park history. Anyways, soon it will be open. I heard, I was talking to the guy, he thought, May. Yeah, so soon. Nice. All right, let's hear some questions. What do we got? All right.

Starting point is 00:57:31 First question is from MNAV, a student at Yale. Looking at tools like chat GPT makes me feel like there's nothing AI won't eventually do better than humans. This fear makes it hard to concentrate on learning since it makes me feel that there isn't certainty in my future. Are my fears unfounded? Well, Manov, hopefully my deep, dive is helping dispel those fears. I want to include this question in part to emphasize the degree to which the hype cycle around these tools has really been unhelpful.

Starting point is 00:58:03 Because you can so easily embed screenshots of interactions of chat ch pt, a lot of people started trying it because the attraction of virality is very strong for lots of people in our online age. So it brought chat chpT to the awareness of a lot of people and generated a lot of attention. Now, once we had a lot of attention, how do you want up that attention? Well, then you start thinking about worries about it. You start thinking about what if it could do this, what if it could do that. From what I understand, I'm not as plugged into these online worlds as others, but there's a whole tech bro push during the last few months that was increasingly trying to push. It can do this.

Starting point is 00:58:39 It can do that. It can do this. Exactly the same tonality with which the same group talked about crypto two years ago. Everything is going to be done by chat, GPT. Just like currency will be gone in three years because of crypto. They turned all their attention onto that. And that got really furious. And everyone was trying to one-up each other and do YouTube videos and podcast about, no, I can do this.

Starting point is 00:58:58 No, I can do that. And then this created this weird counterreaction from the mainstream media because the mainstream media has a right now an adversarial relationship with the Silicon Valley tech bro crowd. They don't like them. So then they started pushing back about, no, it's going to be bad. No, no, these tech bros are leading us to a world where we're going to be able to cheat on test. No, forget, cheat on test. It's going to take all of our jobs. No, forget, take all of our jobs.

Starting point is 00:59:23 It's going to take over the government. It becomes super intelligent. So they started the counteraction to the overblown enthusiasm of the tech bros became an overblown grimness from the anti-tech bro mainstream media. All of it fed into Twitter, which like a blender, was mixing this all together and swirling this spiral of craziness higher and higher until finally just the average person like Monav here at Yale is thinking, how can I even study? knowing that there will be no jobs

Starting point is 00:59:52 and will be enslaved by computers within the next couple of years. All right, so, Monoff, hopefully my deep dive helped you feel better about this. Chat ChpT can write about combinations of known subjects and combinations of known styles. It does not have models of these objects.

Starting point is 01:00:09 It has no state or understanding or incentives. When you ask it to write about removing a peanut butter sandwich from a VCR, it does not have an internal model of a VCR in a sandwich on which it's experimenting with different strategies to figure out which strategy works best, and then turns to its language facility to explain that to you.

Starting point is 01:00:24 It just sees peanut butter is a possible next word to spit out, and you ask about peanut butter in your response, so it puts more votes on it. Mixing, matching, copy, manipulating existing human text. The humor and the jokes it spits out,

Starting point is 01:00:38 the accuracy in the styles it uses, are all intelligence borrowed from the input that it was given it. It cannot broadly, does not have a broad adaptable intelligence that can in any significant sense impact the knowledge work sector. And it's important to emphasize Mnob, it's not like we're one small step from making these models

Starting point is 01:00:57 more flexible, more adaptable, able to do more things. The key to chat CPT being so good at the specific thing it does, which is producing text and known styles on known subjects, is that it had a truly massive amount of training data on which it could train itself. We could give it everything anyone had ever written on the internet for a decade, and it could use all of that to train itself. This is the problem when you try to adapt these models to other types of activities that are not just producing text. You say, what I really want is a model that can, you know, work with my databases.

Starting point is 01:01:30 What I really want is a model that can, you know, send emails and attach files on my behalf. The problem is you don't have enough training data. You need training data where you have billions of examples of here's the situation, here's to write an answer. And like most things that we do, we learn after a small number of examples. A model to do other activities other than to produce text needs a ton of data. And in most other types of genres or activities, there's just not that much data. So one of the few examples where there is is art production. This is how Dolly works.

Starting point is 01:02:01 You can give it a huge corpus of pictures that are annotated with text. And it can learn these different styles and subject matters to show up in pictures than produce original artwork. But that's one of the few other areas where you have enough uniform data that it can actually train itself to be super adaptable. So I'm not worried that Manov, like all of our jobs, will be gone. So your fears are unfounded. You can rest easy, study harder for your classes. All right, let's keep it rolling. What do we got, Jesse? All right. Next question is from Aden. It seems almost inevitable that in 10 years, AI will be able to perform many knowledge workers' jobs as well as a human. Should we be worried about the pace of

Starting point is 01:02:37 automation and knowledge work and how can we prepare our careers now for increased power or AI in the coming decades. So as I just explained in the last question, this particular trajectory of AI technology is not about to take all of your jobs. There is, however, and this is why I included this question, there is, however, another potential intersection of artificial intelligence and knowledge work that I've been talking about for years that I think we should be more concerned about, or at least keep a closer eye on. The place where I think AI is going to have the big impact is less sexy than this notion of I just have this blinking chat cursor and I can ask this thing to do whatever I want. Now where it's really going to intersect is shallow task automation. So the shallow work, the stuff we do, the overhead we do to help collaborate, organize, and gather the information need for the main deep work that we execute in our knowledge work jobs. More and more of that is going to be taken over by less sexy, more,

Starting point is 01:03:37 bespoke, but increasingly more effective AI tools. And as these tools get better, I don't have to send 126 emails a day anymore because I can actually have a bespoke AI agent handle a lot of that work for me, not in a general, it's intelligent sense, but in a much more specific like talking to Alexa type sense. Can you gather the data I need for writing this report? Can you set up a meeting next week for me with these three principles? And then that AI agent talks to the AI agents of the three people you need to set the meeting up with. And they figure out together and put that meeting onto the calendar so that none of us, three of us have to ever exchange an email.

Starting point is 01:04:19 The information it gathers from the people who have it by talking to their AI agents, and I never have to bother them. We never have to set up a meeting. It's able to do these rote tasks for us, right? This was actually a future that I was exposed to a decade earlier. I spoke at an event with the CEO of a automated meeting scheduling company called X.com. I remember him telling me this is the future. When you have an AI tool that can talk to another person's AI tool to figure out logistical things on your behalf so that you're never interrupted. I think that's where the big AI impact is going to come.

Starting point is 01:04:54 Now, this does not automate your main work. What it does is it automates away the stuff that gets in the way of your main work. Why is that significant? because it will immensely increase the amount of your main work you're able to get done. If you're not context switching once every five minutes, which is the average time the average knowledge worker spends between email or instant messenger checks, if you're not doing that anymore, you know how much you're going to get done? You know how much if you can just do something until you're done?

Starting point is 01:05:20 And then the AI agents on your computer says, okay, we got the stuff for you for the next thing you need to work on. Here you go. And you have to have no overhead communicating or collaborating and trying to figure out what to do next. You can just execute. you know how much you're going to get done? I would say probably three to four X more. Of the meaningful output that you produce in your job will be produced three to four X more

Starting point is 01:05:37 if these unsexy bespoke AI logistical automator tools get better. So this has this huge potential benefit and this huge potential downside. The benefit, of course, is your work is less exhausting. You can get a lot more done. Companies are going to generate a lot more value. The downside is that might greatly reduce

Starting point is 01:05:58 the number of knowledge workers' requirements. to meet certain production outputs. If three knowledge workers now produce what it used to take 10, I could grow my company, or I could fire seven of those knowledge workers. So I think this is going to create a disruption. We underestimate the degree to which shallow work and context shifting is completely hampering our ability to do work with our minds.

Starting point is 01:06:22 But because it's like the pot that keeps getting hotter until the lobster is boiled, because it's inflecting everybody, we don't realize how much we're being held back. When computer tools aided by AI remove that, it's going to be a huge disruption. And I think ultimately the economy is going to adapt to it. The knowledge industry is going to explode in size and scope as we can unlock all this cognitive potential

Starting point is 01:06:45 on new types of challenges or problems that we weren't thinking about before. Ultimately, it'll probably be good and lead to a huge economic growth, but I think there's going to be a disruption period. Because we really are at such, again, we just don't emphasize that agree to how inefficient we are. And how much if we could remove that inefficiency,

Starting point is 01:07:02 we don't need most of the people sitting here in this office to get the same work done. Getting over that, that's going to be the real disruption. And there's no scary howl from 2001 type tool involved here. These things are going to be boring. Meeting, information scheduling. Basically, whatever you type in an email, it could do that for you. That's going to be the real disruption. I don't know what's coming.

Starting point is 01:07:24 That's coming soon. A lot of money at stake. All right, this is good. I'm like debunking people's fears. Yeah. Like a therapist today. All right. Next question is from Ben, a Silicon Valley engineer.

Starting point is 01:07:36 I've decided that web development freelancing would be the best possible career path to achieve my family's lifestyle vision. And I plan to freelance in addition to my full-time job until freelancing can support our life on its own. Over the last few weeks, however, I've been hearing about the breakthroughs of chat, GPT, and other AI tools.

Starting point is 01:07:55 Do you think I should stay on the past? of learning the ropes of freelancing web development, or should I focus more on the future of technology and try to stay ahead of the curve? Well, Ben, I'm using the inclusion of AI in your question to secretly get in an example of lifestyle-centric career planning, which you know I like to talk about. So I love your approach. You're working backwards from a vision of what you want you and your family's life to be like, a tangible lifestyle, not specific in what particular. job or city, but the attributes of the lifestyle.

Starting point is 01:08:28 And then you're working backwards to say, what is a tractable path from where I am to accomplish that? And you're seeing here, web development could be there, freelance web development. And I don't know all the details of your plan, but I'm assuming you probably have a plan where you're living somewhere that is cheaper to live, maybe it's more outside or country-oriented, where your expenses are lower. And because web development is a relatively high reward per hour spent type activity, that strategic freelancing could support.

Starting point is 01:08:56 your lifestyle there while giving you huge amounts of autonomy, therefore satisfying the various properties that I assume you figured out about what you want in your life, these sort of non-professional properties. So I really applaud this thinking. I also really applaud the fact that you're applying the principle of letting money be a neutral indicator of value. As a strategy I talked about in my book so good they can't ignore you. This is a strategy in which instead of just jumping into something new, you try it on the side and say, can I actually make money at this? The idea here is that people paying you money is the most unbiased feedback you will get about how valuable or viable the thing you're doing actually is. And so I really like this idea.

Starting point is 01:09:37 Let me try freelancing on the side. I mean, I want to see that people are hiring me and this money is coming in before I quit my job. It's a great way of actually assessing. Don't just ask people, hey, is this valuable? Do you think you would hire me? Look at dollars coming in. When the dollars coming in are enough to more or less support your new lifestyle, you didn't make that transition with confidence. So I appreciate that as well.

Starting point is 01:09:57 Two Cal Newport ideas being deployed here. So let's get to your AI question. Should you stop this plan? So you can focus more on the future of technology and try to stay ahead of the curve. I mean, I don't even really know what that means. My main advice would be whatever skill it is you're learning, make sure you're learning a course at the cutting edge of it.

Starting point is 01:10:16 Get real feedback from real people in this field about what is actually valuable and how good you have to be to unlock that value. So I would say that. Don't invent your own story. about how you want this field to work. Don't assume that if you know HTML and a little CSS,

Starting point is 01:10:31 you're going to be living easy. What are the actual skills people care about? What web development technologies sell? How hard are they? How good you have to be at that to actually be desirable to the marketplace? Get hard, verified answers to those questions. That's what I would say

Starting point is 01:10:45 when it comes to staying ahead of the curve. That's it. But as for some sort of fear that you become a web developer is quixotic because chat GPT is going to take that job. job always soon. Don't worry about that. So yes, make sure you're learning the right skills, not the skills you want to be valuable,

Starting point is 01:11:01 but there's nothing wrong with this particular plan you have. I love the way you're executing it. I love the way you're thinking about it. I also appreciate it. Jesse, but Ben had a joke in the beginning of his response. He said, I've been doing lifestyle-centric career planning. I've been thinking about I don't like my job. So what we're going to do is quit and I'm going to start a cattle ranch.

Starting point is 01:11:23 He's like, ah, just joking. I appreciate that. All right, let's do one more question before I kind of run out of the ability to talk about AI anymore. I'm getting just, we're purging this. It's been months that people have been asking us about AI. Yeah. Just purging it all out there. Next week, we're going to talk all about, I don't know, living in the country or minimalism, not using social media.

Starting point is 01:11:49 But we're getting all the AI out of our system today. All right, here we go. Last question is from Anakin. AI can solve high school level math. word problems. AI can explain why a joke that has never seen before is funny. This one blows my mind. All this points to mass job loss within five years. What do you think? Well, again, big thumbs down. The current trajectory of AI is not going to create mass job loss in the next five years. ChatGPT doesn't know why a joke is funny. It writes jokes that are funny because it knows what part of a

Starting point is 01:12:22 script you can identify as a joke that's a pattern-mashing problem. And then it upfotes words. from those parts of scripts when it's doing the word guessing game. And as a result, you get jokes that pull from existing structures of human and existing text. It does not actually know what humor is. You can see that in part if you look at that Seinfeld scene with Bubble Sort. I talked about at the beginning of the beginning of the program. There's non-sequitarius jokes in there. Things that are described as the audience laughing that aren't actually funny.

Starting point is 01:12:50 And that's because it's not actually looking at it script and saying, is this funny? It's guessing words. guessing words that things are accurate. But let's talk about, I want to use this as an excuse to talk about another trend in AI that I think is probably more important than any of these large language models that also is not getting talked about enough. So we talked about an earlier question, AI shallow work automation is being critical. The other major development that we're so used to now we forget, but I think is actually going to be the key to this AI shallow work automation, but also all sorts of other ways AI interest. their life is not these large language models, but it's what Google has in mind with Google Home. It's what Amazon has in mind with Alexa.

Starting point is 01:13:36 These at-home appliances that you can talk to and ask to do things, they're ubiquitous. And they're ubiquitous in part because these companies made them very cheap. They wanted people to use them. They're not trying to make a lot of money off them. Why? Why is Google or Amazon trying to get as many people as possible to use these agents at home that you could just talk to and it tries to understand you? you. It's data.

Starting point is 01:13:57 The game they are playing is we want millions and millions of different people with different accents and different types of voices asking about all sorts of different things that we then try to understand. And we could then use this data set to train increasingly better interfaces that can understand natural language. That's the whole ballgame. Now, chat GPT is pretty good at this. They figured out a new model.

Starting point is 01:14:19 I don't want to get into the weeds. They have a human semi-supervised, semi-human supervised reinforcement learning model. that they inserted during the GPT3 training to try to align its responses better with what's being asked. But this is the whole ballgame is just natural language understanding. And Google is working on this, and Amazon is working on this, and Apple is working on this with their Siri devices. And this is going to be, this is what matters, understanding people. The background activities, I think this is what we often get wrong. The actual activities that the disruptive AI in the future is going to do on our behalf are not that interesting.

Starting point is 01:14:55 It's not we're going to go write an aria. It's we're going to pull some data from an Excel table and email it to someone. It's we're going to turn off the lights in your house because you said you were gone. It's really kind of boring stuff. All of the interesting stuff is understanding what you're saying. And that's why Google and Amazon and Apple invested so much money into getting these devices everywhere is they want to train and generate the biggest possible dataset of actually understanding what real people are saying and figuring out, do we get it right or do we get it wrong?

Starting point is 01:15:25 Can we look at examples and let's hand annotate these examples and figure out how our models work? And I think this is really going to be the slow creep of AI disruption. It's not going to be this one entity that suddenly takes over everyone's job. It's going to be that more and more devices in our world are increasingly better at understanding natural language questions, whether it be typed or spoken, and can then act accordingly even if what we're asking to do is simple. And we don't really need these agents to do really complicated things. We just need them to understand what we mean. most of what we do that's a drag on our time

Starting point is 01:15:56 is a drag on our energy is pretty simple and it's something a machine could do if they just knew what it was we wanted and so I think that's where we should be focusing

Starting point is 01:16:04 is interfacing's what matters these 175 billion parameter models that can generate all this text is really not that interesting I don't need a Seinfeld script about bubble sort

Starting point is 01:16:17 I need you to understand me when I say give me the email address of all of the students from my first section on my class. I need you just to understand what that means and be able to interface with the student database and get those emails out and format it properly so I can send the message to the class.

Starting point is 01:16:35 That's what I want you to do. I don't want you to write an original poem in the style of Mary Oliver about a hockey game for my students. I need you to just go through when I say look at the assignment pages for the process, I assigned this semester, pull out the grading statistics and put them all into one document, just to kind of, okay, I know what that means.

Starting point is 01:17:00 And now I'm doing a pretty rote, automatic computer type thing. And I don't care if you come back and say, okay, I have three follow-up questions so I understand what you mean. You mean this section, that section, this section, that's fine. That'll just take us another 10 or 15 seconds. I don't need, in other words, how from 2001. I just need my basic computer to understand what I'm saying so I don't have to type it in or click on a lot of different buttons.

Starting point is 01:17:21 So, I mean, I think that's really where we're going to see the big innovations is the slow creep of better and better human understanding plugged into relatively non-interesting actions. That's really where the stuff's going to take a bigger and bigger picture. The disruption is going to be more subtle. This idea that it's now this all at once large language model represents we have this new self-sufficient being that all at once will do everything. It's sexy, but I don't think that's the way this is going to happen. All right, so let's change gears here, Jesse. I want to do something interesting to wrap up the show. First I want to mention one of the longtime sponsors that makes deep questions possible.

Starting point is 01:17:58 That's our friends at Blinkist. When you subscribe to Blinkist, you get access to 15-minute summaries of over 5,500 nonfiction books and podcasts. These short summaries, which are called Blinks, you can either read them or you can listen to them while you do other activities. Jesse and I both use Blinkist as one of our primary tools for triaging, which books we read. If there is a book we are interested in, we will read or listen to the blink to see if it's worth buying. The other week, Justin and I went through his own Blinkas list, and we calculated that basically, it was roughly what, 30%, I think you said, 30% of the books for which you read Blinks have you go on the buy. So there we go. That is a case and point of the

Starting point is 01:18:43 value of Blinkist. It really helps you hone in on the right books to buy. And the books you don't buy, you still know their main idea so you can deploy them. So Blinkis is really a tool that any serious reader needs to add to their tool kit. So right now Blinkis has a special offer just for our audience. Go to Blinkis.com slash deep to start your seven day free trial and you will get 45% off a Blinkis premium membership. That's Blinkis spelled B-L-I-N-K-I-S-T, Blinkis.com slash deep to get 45% off at a seven-day free trial. That's Blinkis.com slash deep. This offer is good through April 30th only. Even better news, for a limited time,

Starting point is 01:19:23 you can take advantage of their Blinkist Connect program to share your premium account. You will, in other words, get two premium accounts for the price of one. So when you subscribe the Blinkist, you can give an account to a friend who you think would also appreciate it. So Blinkist.com slash deep to find out more.

Starting point is 01:19:42 I also want to talk about latter. It's tax season. Tax season is when I often get. get stressed about putting things off because I don't even know where to get started. This got me thinking about the other classic thing that people put off, which is getting life insurance. This is one of those things that you know you need, but you don't know where to get started. Who do you talk to for life insurance?

Starting point is 01:20:04 How much should life insurance talk costs? You're going to have to go to a doctor and get blood drawn and, you know, your eyeball scan before you can actually get a policy. And we get paralyzed by all this complexity and say, forget it. Well, this is where ladder enters the scene. Ladder makes it easy to get life insurance. It's 100% digital, no doctors, no needles, and no paperwork when you're applying for $3 million in coverage or less. You just answer a few questions about your health on an application.

Starting point is 01:20:34 You need just a few minutes in a phone or laptop to apply. Ladder smart algorithms work in real time, so you'll find out instantly if you're approved. No hidden fees. Cancel any time. get a full refund if you change your mind in the first 30 days. Life insurance is only going to cost more as you get older, so now is always the right time to get it. So go to ladderlife.com slash deep today to see if you're instantly approved.

Starting point is 01:21:01 That's L-A-D-D-R-Life.com slash deep. Ladderlife.com slash deep. All right, Jesse, let's do something interesting. for those who are new to the show, this is the segment where I take something that you sent me to my interesting at calnewport.com email address you thought I might find interesting. I take something that caught my attention

Starting point is 01:21:25 and I share it. So here's something a lot of people sent me. This is actually from just a couple days ago. I have it up on the screen now. So if you're listening, you can watch this at YouTube.com plus CalNuport Media episode 244. It is an article from NPR. With the following music to my ears headline, MPR quits Twitter.

Starting point is 01:21:52 There's a whole backstory to this. MPR is having basically a feud with Twitter because Twitter started labeling the network as state-affiliated media. The same description they use for propaganda outlets in Russia, China, and other autocratic countries. That did not sit well with NPR. So then they changed it and said, okay, well, we'll call you government-funded media. But NPR said only 1% of our annual budget comes from federal funding. That's not really accurate either.

Starting point is 01:22:23 You know what? Enough. They're walking away. MPR politics, for example, put out a tweet that said all of our 52 Twitter accounts, we're not going to use them anymore. You want news for NPR. Subscribe to our email newsletter. Come to our website.

Starting point is 01:22:40 Listen to our radio program. will keep you up to date. You don't have to use this other guy's program. I really like to see that. Not because of the interseen political battles between Musk and the political, the different media outlets. I mean, I wish they would just make this move even without that. But whatever gets them there, I think, is good. As I've talked about so many times on this show before, it is baked into the architecture of Twitter that it is going to generate outrage.

Starting point is 01:23:09 It is going to manipulate your emotions. It is going to create tribalism and is going to really color your understanding of the world, your understanding of current events in ways that are highly inaccurate and often highly inflammatory. It's just built into the collaborative curation mechanism of all these retweets combined with the Power Law Network. We've talked about this before. It's not a great way to consumer share information. Now, more and more outlets are doing this. So a couple of weeks ago, we gave the example of the Washington Post

Starting point is 01:23:45 Nationals baseball team coverage shifting away from live tweeting the games and instead having live updates on their Washington Post.com website. And at the end of all those updates, then they write a recap article. And it's all packaged together. And you can see how it unfolded and they have more different people writing and it unfolds in real time. And I think the whole thing is great. There's no reason to be on someone else's platform mixing tweets about the baseball game

Starting point is 01:24:08 with tweets about, you know, the Martian that's going to come and destroy the earth because you didn't give it hydroxychloroquine or whatever the hell else is going on on Twitter. Twitter was a lot of fun. It's a lot of engaging. It's a lot of very engaging. It's not the right way to consume news. It's not the right way to spread news. And I'm really happy about this trend.

Starting point is 01:24:28 I think we would be in a much calmer place as individuals. I think we'd be in a much calmer place politically. I think we'd be in a much calmer place just from a civic perspective. if more and more sources of news, if more and more sources of expression, if more and more sources of commentary, move to their own private gardens. Here's my website.

Starting point is 01:24:48 Here's my podcast. Here's my newsletter. Not this giant mixing bowl where everyone and everything comes together in a homogenized interface. And we have this distributed curation mechanism rapidly amplifying some things versus others. That's not a healthy way for a

Starting point is 01:25:07 society to learn about things and communicate. And, you know, I wrote about this in Wired magazine early in the pandemic. I wrote an op-bad for Wired. I said, one of the number one things we could do for public health right now at the beginning of the pandemic would be shut down Twitter. And I gave this argument in that article that, look, if professionals and if we retreated to websites, we could have considered long-form articles with rich links in the other types of articles and sources, where the websites themselves, you could indicate authority by

Starting point is 01:25:43 the fact that this website is hosted at a hospital or a known academic institution or a known news organization. And we'd be able to curate on an individual sense of this website is at, you know, the reference an old S&L skit, clown penis dot fart, and it has weird gifts of it of eagles that are, you know, flying away with the song bin Laden. and I'm not going to pay as much attention to that to like this long form article that's coming out of the Cleveland Clinic, right?

Starting point is 01:26:09 And I said if we went back, humans will be much better at consuming information, the information will be much better presented. If we went back to more bespoke, distributed communication, as opposed to having everyone, the cranks and the experts and the weirdos and everybody all mixed together in the exact same interface with the exact same, their tweets look exactly the same,

Starting point is 01:26:26 and a completely dispassionate, distributed curation mechanism, rapidly amplifying things that are catching people's attention. We need to get away from that automation. We need to get away from that distribute curation. And we get back to more bespoke things. We can learn from where you hosted. What does this look like?

Starting point is 01:26:43 What are the texts? What are the things have you written? We can really have context for information. Anyways, I don't want to go too far into this lecture. But basically, this is a good trend. I think individually owned digital distribution of information, podcast, websites, blogs, newsletters, is going to be a much healthier trend than saying,

Starting point is 01:27:03 why don't we all just have one or two services that everyone uses? So good job, NPR. I hope more and more news sources follow your lead in the Washington Post's Nationals, Reporters, leagues. I think this is the right direction. Do you know the clown penis dot fart reference? No. It was in the late 90s. It was a classic S&L skit, and it was like an advertisement.

Starting point is 01:27:24 And they really had it. You know how they used to have those advertisements for brokerage firms? Yeah. Where it's like super authority, you know, I'm welcome. It's like Wilford Brimley. welcome to such and such financial partners where trust is our number one, whatever. And so it's like this real serious ad. And they're like, you can find out more at our website, clown penis. fart.

Starting point is 01:27:45 And then they kind of reveal by the time we started, by the time we set up our website, it was the only address left. So it was like the premise of it is like this really serious like financial brokerage firm. But in the late 90s, it felt like all the URLs were gone. And so it was like the only URL that was left. And so it was just a very serious commercial. what they kept having to say clown peters dot fart. Classic SNL.

Starting point is 01:28:08 All right, I'm tired. It's too much AI. I'm happy now to talk about AI whenever you want. My gag order has been lifted, but maybe it'll be a while until we get too much more deeply into that. But thank you all for listening. Thank you all for putting up with that. We'll be back next week with a next,

Starting point is 01:28:23 hopefully, much less computer science-filled episode of the podcast. And until then, as always, stay deep. Hi, it's Cal here. One more thing before you go. If you like the Deep Questions podcast, you will love my email newsletter, which you can sign up for at calnewport.com. Each week, I send out a new essay about the theory or practice of living deeply. I've been writing this newsletter since 2007, and over 70,000 subscribers get it sent to their inboxes each week. So if you are serious about resisting the forces of distraction and distraction and share, you.

Starting point is 01:29:04 shallowness that afflict our world, you got to sign up for my newsletter at caldewport.com and get some deep wisdom delivered to your inbox each week.

Deep Questions with Cal Newport - Ep. 244: Thoughts on ChatGPT

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.