Hard Fork - OpenAI’s Big Reset + A.I. in the Doctor’s Office + Talkie, a pre-1930s LLM

Episode Date: May 1, 2026

This week, OpenAI announced a loosened partnership with Microsoft and an aggressive new strategy to secure computing power. We unpack what these updates signal about OpenAI’s business strategy and w...hether the company can scale while balancing a trial against Elon Musk and investor concerns over missed financial targets. Then, the A.I. researcher Dr. Adam Rodman, of Harvard Medical School, returns to tell us about the most significant ways A.I. is changing how doctors treat patients. And finally, can an LLM trained only on very old texts predict the future? We’re talking with one of the creators of the chatbot talkie.   Guests: Dr. Adam Rodman, internal medicine physician at Beth Israel Deaconess Medical Center and assistant professor at Harvard Medical School. David Duvenaud, associate professor at the University of Toronto, former team lead at Anthropic and co-creator of talkie.   Additional Reading: Microsoft and OpenAI Loosen Their Partnership Elon Musk and Sam Altman’s Epic Fight Heads to Court OpenAI Misses Key Revenue, User Targets in High-Stakes Sprint Toward IPO Take It From a Doctor: It’s OK if Your Medical Advice Comes From A.I.   We want to hear from you. Email us at hardfork@nytimes.com. Find “Hard Fork” on YouTube and TikTok. Subscribe today at nytimes.com/podcasts or on Apple Podcasts and Spotify. You can also subscribe via your favorite podcast app here https://www.nytimes.com/activate-access/audio?source=podcatcher. For more podcasts and narrated articles, download The New York Times app at nytimes.com/app. Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.

Transcript
Discussion (0)
Starting point is 00:00:00 Casey, I miss you. You are in New York? I am, Kevin, and of course, I miss you as well, but it's always fun to visit the mothership. You know, Ezra Klein just challenged me to a burping contest. So I've got that to look forward to later. Burping or burpee? You know what? I guess I should go read that email again. Okay. Well, I miss you. We have an empty chair here in San Francisco, and it's not the same. It's not the same, but I've been catching up on all the latest AI news, Kevin, and I had to ask, have you seen this thing about Codex and the Goblins? Yes, this is the new update to Open AIs Codex that is, like, obsessed with Goblins? Yes, apparently the company had to add instructions to its latest model to forbid Codex from randomly mentioning an assortment of mythical and real creatures, including goblins, gremlins, raccoons, trolls, ogres, and pigeons. Or as we call them, our slate of guests on hard fork. Listen, we have a pigeon coming up, and you're really going to want to hear their take. No, I'd heard that OpenAI had been accused of goblin up copyright data, but do you have any explanation of what's been going on with the goblins?
Starting point is 00:01:14 Well, I mean, I think it's pretty clear what's happened, which is that when it built ChatGPT, OpenAI awakened in ancient evil. and this is the last line of defense we have at it breaking containment and killing our families. So let's hope the guard rails hold. I'm Kevin Roos, a tech columnist at The New York Times. I'm Casey Doon from Platformer. And this is Hard Fork. This week, Open AI's Big Reset. We'll talk about the company's new business strategy and its dramatic trial with Elon Musk.
Starting point is 00:01:47 Then, Dr. Adam Rodman returns to the show to tell us about the latest advances in AI and medicine. And finally, can an AI made out of very old tech still? predict the future? We're talking about talking. So, Casey, there's been a lot happening with OpenAI in particular this week. There seems to be something of a major strategic reset happening over there. They've got a new deal with Microsoft, an expansion of their deal with Amazon, changes to their Stargate compute strategy, and a new push toward new kinds of ad-supported subscriptions. And, of course, they've got this big trial with Elon Musk that started this week. week in Oakland. So let's get through all of it. But first, before we do that, let's make our
Starting point is 00:02:43 disclosures. I work for the New York Times, which is suing Open AI, Microsoft, and Perplexity. And my fiancee works at Anthropic. Okay, so let's start this week with this new Microsoft deal. So Microsoft and Open AI have, of course, been partners for many years. Microsoft remains the biggest investor in OpenAI. Their stake is valued at about $135 billion. But their relationship has also been strained over the years by various factors. And this week, they seem to be sort of consciously uncoupling, or at least rewriting their partnership agreement and allowing OpenAI to be a little bit more promiscuous in who they do deals with. Yeah, I mean, Open AI just had this real challenge, which was that until this week, they were really only allowed to serve their
Starting point is 00:03:31 models on Microsoft's infrastructure. And one thing we talk about on the show a lot is just that, a lot of the big cloud service providers, their infrastructure is just maxed out, and Microsoft is one of those. And so for OpenAI's revenue to grow, they needed to find other ways that they could deliver their services. And so to my mind,
Starting point is 00:03:49 that was maybe the most important thing about this deal. Yeah. So under this new rewritten version of the Microsoft and OpenAI deal, Microsoft will no longer have to share revenue with OpenAI. The new deal also removes the part of the original agreement that had to do with AGI. The old agreement said that basically,
Starting point is 00:04:06 once OpenAI reached AGI. Microsoft would stop getting certain revenue share payments, but under the new agreement, OpenAI will keep sharing revenue with Microsoft until 2030 no matter what benchmarks they hit. So the AGI clause is gone. And I, for one, will be
Starting point is 00:04:23 sad to see it go because I think it was sort of the funniest clause in the entire AI world, right? It was basically like, well, if we ever get to a point where Open A.I. says the magic word in the entire world changes. And now they're not allowed to say the magic word anymore. Right.
Starting point is 00:04:39 So AGI has been poorly defined for many years, and everyone has got their own definition. But it did have this one interesting, like contractual stipulation. And now even that is off the table. So now we just have sort of AGI as evaluated by vibes. So Casey, what did you make of this loosened partnership between Open AIA and Microsoft? Well, I think it seems like probably a good deal for both of them, right? Like there was a moment when it seemed for both companies, like being very, very closely aligned was the best thing for both, arguably for a time it was. But with all of the various revenue, compute, and customer needs that both of these companies are now trying to serve, I think it's benefiting both of them to play the field and sign other partnerships.
Starting point is 00:05:27 So my read on this was like, this is basically good for both of them. But what about you? Yeah, I think it's good for both of them. I think it's a little better for Open AI. they got most of what they wanted. I think the bigger deal for them is the ability to work with other cloud providers. So now they can work with Amazon or Google Cloud Platform. And big corporate customers who use those cloud platforms can now use OpenAI models.
Starting point is 00:05:50 They don't have to go to Azure to do that. I think that allows them to strike these other bigger deals and to reach other corporate customers who may have been limited before by the fact that it's really hard and annoying to change cloud providers. Yeah. And speaking of big deals, They signed what seemed like a pretty big one with Amazon this week. Yeah, so Open AI wasted no time in its new Open Marriage with Microsoft.
Starting point is 00:06:13 It went back out on the market and found themselves in bed with Amazon. It found themselves in Bedrock with Amazon. But I'm sorry, you can go on. Yes, so on Tuesday, OpenAI and Amazon announced an expansion of their deal that they'd announced back in February that will allow OpenAI to sell its models through AWS's Bedrock AI platform. and make codex its coding model available on Bedrock as well. OpenAI and Amazon will reportedly also develop customized models to power Amazon's consumer-facing applications,
Starting point is 00:06:47 and Amazon will invest $50 billion in OpenAI. So there's some interesting stuff here. I think the interesting subtext to me is that Amazon, for a number of years now, has been pretty closely tethered to Anthropic as its primary sort of frontier model developer. And so OpenAI is kind of taking advantage of its newfound freedom by trying to elbow into Amazon
Starting point is 00:07:13 and maybe displace Anthropic as their favorite model provider. Well, I know that Amazon was talking a really big game about this deal. The CEO of AWS was giving interviews essentially saying, like, OpenAI belongs to us now. It was kind of a The Boy is Mind situation. I remember the old Brandy and Monica hit from back in the day. is kind of bringing that back in a little bit more of an AI flavor. I also have to say, I find it very interesting that Amazon named its platform Bedrock because that's where the
Starting point is 00:07:40 Flintstones are from. It seems a rather backward looking for a leading AI company, Kevin, wouldn't you say? That's great analysis. Thank you. Thank you. By the way, like, I think that this is a really important point and a reason that we are like talking about it to a big general audience, which is that the story that you just described, Kevin, is one of a world where no one has the resources they need to serve the demand for AI that they have. And I think, you know, at a moment where we're, you know, still sort of seeing a lot of skepticism, there's so much bubble talk, I just want to like posit that as a really important point in understanding what sort of bubble this is, because even the biggest companies do not
Starting point is 00:08:23 have the resources that they need to serve that demand. Yeah, and I think that's a good point. And it's really a profound shift in the way that skeptics have been talking about this AI boom. I remember just even a couple of months ago, the leading sort of strain of criticism was that these AI companies would never be able to generate the demand to pay for all of the expensive data centers and infrastructure projects they wanted to do. And now that's shifted to, well, there's so much demand. what if they can't build enough to support the demand they have? Yes, and on that front, it does seem with at least one of these mega building projects, there have been some problems recently, Kevin. Yes, this was another story that hit this week.
Starting point is 00:09:09 The Financial Times reported that Stargate, OpenAI's joint $500 billion infrastructure project, is also undergoing a bit of a shift. The FT reported that in recent weeks, OpenAI has halted planned data centers in the UK and Norway, declined to expand its flagship site in Abilene, Texas, and seen several senior figures tied to Stargate leave for rival meta. The FT further notes that OpenAI has shifted to leasing capacity from third parties instead of building out all of their own facilities. Casey, what did you make of this? I think this was a case where, like, reality has just finally intruded on the Stargate project. Like, when all of these deals were getting announced initially,
Starting point is 00:09:52 this is how they sounded. Well, we're going to... to spend one but trillion dollars that we don't have to build 40 quadrillion data centers. And at the time, people said that kind of seems like a lot. Can you guys actually live up to that? And they said, yeah, just watch us. Well, guess what? They could. And now they're changing course. Yeah. I don't think there signals that they are sort of retreating from their compute ambitions. I think it's more about like they are realizing that if they want to go public, which they do, they need to sort of get their house in order. And one way to get your house in order is to move some of this data center and infrastructure
Starting point is 00:10:28 building off of your balance sheet and onto third parties. Yes. But there is one point in there, Kevin, that I do want to ask you about, which is that Berbergen at the Wall Street Journal over the past week had this really interesting story where he said that Open AI had failed to meet some of their internal user number targets and some of their revenue targets. and that this was possibly creating some tension between Sam Altman and his CFO, Sarah Fryer, as they consider potentially doing an initial public offering later this year.
Starting point is 00:10:59 So curious what you made of that story, and does this maybe help explain why OpenAI has had to pull back on some of its big Stargate ambitions? Yeah, I mean, I think there are competing forces within all of the big AI companies right now. one side is sort of the indefinite optimists, the people who think that demand for AI is just going to be essentially infinite and that as much compute and as much money as they need to
Starting point is 00:11:27 spend acquiring compute, it will all be paid back many times over because the world is about to change into something most of us barely recognize and so kind of just trust us on that is sort of one camp. And then there are the sort of, you know, the number crunchers
Starting point is 00:11:43 who are trying to fit all this into a kind of financial projection that will make sense to investors who are not as convinced that the world is about to change forever and who want to see things like, what is your plan for actually making the revenue that you're going to need to pay for all this stuff? So I think this is happening in a way at OpenAI that is now because of Berber's story is out there. But I think this kind of tension exists at all of the big AI companies. And so I think right now what we're seeing is kind of that power struggle breaking out into the open.
Starting point is 00:12:20 Yes. And for what it's worth, OpenAI did call this story prime clickbait, which I think just refers to clickbait that's really, really good. Is that what that means? Yes. It's sort of like Wagyu clickbait. Yes, exactly. This clickbait was dry-aged for a month before it was served and it's delicious. Yeah. And I think one thing I want to flag on this is that these growth projections that Open AI reportedly did not hit. Those were in 2025. I think it is fair to wonder if something has changed in just the last few months because of the enormous rapid growth of tools like Codex and ClaudeCode. We have seen just reports of astronomical growth in those tools.
Starting point is 00:13:02 So it may be that OpenAI was having some growth issues late last year, but that because of this agentic coding boom, things have started to turn around. We just don't know yet. That makes sense to me. It does seem like their Codex app in particular was really well received. But there's been this other transformation that seems to be unfolding, Kevin, this week. The information had this really interesting story where apparently OpenAI projected at the start of the year that it's $8 a month subscription, which is called ChatGPT Go, which sort of gives you a little bit of the good stuff, but not as much as if you're paying $20 or more for ChatGPT. They predicted that its Go's subscriptions would grow 36 times this.
Starting point is 00:13:44 year to 112 million people, while meanwhile, it's $20 a month plus subscriptions would fall 80% to about 9 million. So that's like a really interesting business pivot that I would love to know more about. Of course, it sounds a lot like the new Netflix plan that they rolled out a while back, right, where it's sort of like, well, you know, it's going to be a lot cheaper, but we'll show you ads. I was curious, like what you make of that strategy, because, you know, part of me feels like, well, they'd much rather have, you know, the $20 subs and the $8 sub.
Starting point is 00:14:14 but maybe there's just a lot more of those $8 subs out there. Yeah, I think what's happening here is that the market is essentially splitting into, right? There's the sort of casual hobby users who are using AI chatbots like chat chpti, like Cloud, for sort of souped up Google queries to, you know, help them write emails and maybe only using it a couple times a day. And if you're doing that, you probably don't want to pay 20 bucks a month. You're probably more comfortable paying $8 a month, or maybe you don't want to pay anything at all. and you'd just rather use the free ad-supported tier of all of this stuff. And then there's the professional users for whom this is worth way more than $20 a month
Starting point is 00:14:52 and who are willing to pay many multiples of that to get the access to the latest models, to have higher rate limits. And so I think all of the companies now are sort of, you know, doing this kind of experimentation with how much can we charge the professional users without losing them to a rival company and how cheap can we make the kind of lower and subscriptions or the free tiers so that people who are more casual users won't be tempted to go use Google instead.
Starting point is 00:15:22 That makes sense. I'll say for my part, I'd be willing to pay even more for a chat GPT if they would just let the Codex app talk about goblins. I say, free the goblins! These models are so weird. It is so weird that we have this technology that is now sort of load-bearing infrastructure
Starting point is 00:15:40 for the entire economy that every business is using to completely reinvent the way that it works and that out of nowhere, if not specially restrained, it will just start talking about goblins. Which to me is just like a satire of the AI safety conversation. You know, like lately the OpenAI has sort of been very skeptical of AI safety and casting a lot of aspersions on Dumers. But it's like, well, we did have to add safety guardrails to prevent goblins from taking over our coding app.
Starting point is 00:16:09 And that's a real story. So as usual, I'm just loving life here in 2026. What a world. So those are a bunch of stories about OpenAI's strategic pivot, it's reset. But there is this other big variable here, this potential fly in the ointment. And that is the long-awaited Elon Musk trial that got underway in a federal courtroom in Oakland this week. Casey, can you remind us what this case is about? Yes, so Elon Musk was famously one of the first.
Starting point is 00:16:41 of the co-founders of OpenAI. He gave the company some of its initial funding, but left in a power struggle between himself, Sam Altman, Greg Brockman, and some others. And a few years after all of that went down, and notably after Elon started his own AI company, he sued Open AI and said, I have been defrauded. This was only ever supposed to be a non-profit and you've gone and turned it into one of the world's most valuable companies through its for-profit arm. So he is suing to stop all of that. If he wins, any winnings will be given to OpenAI's nonprofit arm. Notably, Kevin, he made 26 claims when he originally filed this lawsuit in 2024, but only two have survived to trial, unjust enrichment, and breach of charitable trust. So the trial is just getting
Starting point is 00:17:31 underway. They've done jury selection, and they've had a couple witnesses testify. Elon Musk himself took the witness stand on Tuesday and said, quote, This lawsuit is very simple. It is not okay to steal a charity. He also said that if OpenAI is allowed to get away with this, quote, it will give license to looting every charity in America. Basically, he is saying this thing that started as a nonprofit that was supposed to continue as a nonprofit, became through some corporate restructurings, a for-profit company that
Starting point is 00:18:06 has raised many billions of dollars, and that if this is legal to do, every charity would this? Why wouldn't you want to take your donor's money and turn yourself into a well-funded startup? Yes. Now, one inconvenient truth that Elon Musk faces here, which is that OpenAI's for-profit business is still
Starting point is 00:18:24 controlled by a nonprofit. There's this foundation that houses the public benefit corporation, and while I do empathize with those who say, hey, it really seems like the non-profit hasn't done all that much, and, you know, most of their money is being used for for-profit activities, this was litigated, and the
Starting point is 00:18:41 nonprofit, you know, still does have, like, voting control over the for-profit. Yeah. So Elon Musk is saying this is a case of looting a charity. OpenAI's lawyers have accused Elon basically of just being bitter that the company has succeeded without him. Its lead counsel, William Savitt, said during the trial, quote, we are here because Musk didn't get his way at Open AI. My clients had the nerve to go on and succeed without him. Mr. Musk did not like that. They have also been pointing out that Elon had also wanted to make Open AI have a for-profit subsidiary back when he was with the company and that he's just mad that he didn't get to control it. Yeah, to underline that, like in 2017, 2018, there are emails from Elon Musk where he talks about turning this into a for-profit.
Starting point is 00:19:26 So, you know, whatever concerns he had about looting the charity, you know, today, like he did not have them back at the time. Right. He also wanted to fold Open AI into Tesla. That was revealed in some of these emails. Tesla, of course, being a for-profit company. So it seems like this is not exactly a consistent and principled stand. But Casey, what are the stakes here? Like, if Elon Musk does manage to convince a jury that this was a case of OpenAI looting a nonprofit for its own commercial gain,
Starting point is 00:19:57 like what could the remedies be? Could this be fatal for Open AI? Or is this just sort of an attempt to slow them down and distract them with a big trial? I think that it is much more the latter, Like, based on my reading of the case and what I've seen sort of legal experts say about it, the whole case is very unusual that it even made it to trial. Like, for the most part, if you donate money to a nonprofit, you actually don't have a say
Starting point is 00:20:21 in what happens to it after that. So it's very unusual that the judge even granted him standing to sue here. And as I noted, she threw out most of his claims. That said, let's say that, you know, there's some single digit percentage chance of him winning something here. what he wants to do is to take more than $150 billion that is currently under the control of the for-profit business and give that back to the nonprofit, which would create a lot of headaches and roadblocks for OpenAI as it tries to build out Stargate and do everything else it wants to do. Yeah, I think the lawsuit and this ongoing litigation between Elon Musk and Open AI has been very distracting for OpenAI.
Starting point is 00:21:02 But as a journalist and as a person who wants to know more about the inner workings of how these companies run, I think it's been actually very valuable for a lot of these emails and early communications between OpenAI leaders to be released as part of this litigation. I have found it very useful in understanding some of the early dynamics at OpenAI. And it also just illustrates the degree to which these projects are all just sort of fueled by grudges. right? There's sort of one level of interpretation, which is like all of these people are just like obsessed with building the machine god and that this is all sort of related to their visions of the future. And then there's like another more base level, which is just like these people are all just rivals and they have these petty, longstanding grudges and they just
Starting point is 00:21:48 don't like each other very much. And so you can interpret a lot of what happens in AI through the lens of personal animus. Yes, I've said this before and it is rude, but a shocking percentage of the AI industry is just people who decided they didn't want to work with Sam Altman and who now have their own companies. Right. So Casey, some people have been looking at all of this drama and intrigues surrounding Open AI from the trial to the Microsoft deal to these missed growth projections and saying some version of like Open AI is in trouble. They are not going to make it to an IPO. They are going to sputter out and maybe end up in some real hot water
Starting point is 00:22:31 and maybe Elon Musk wins this trial and it's sort of the end of Open AI as we know it. What do you make of those gloomy predictions? Yeah, I mean, look, there are some fundamentals for Open AI that remain worrisome, right? They're planning to burn tens and tens of billions of dollars in cash before they achieve profitability. They still have this very ambitious infrastructure buildout
Starting point is 00:22:52 that is quite expensive. And so, like, I'm not going to sit here and say that, like, all of the numbers seem to pencil out for this company. On the whole, like, if I try to, you know, put myself into the shoes of their CFO and I look through all of the stories that we just talked about, I think these seem like smart things to me. You know, it kind of seems like they're starting to dot their eyes and cross their teas and get this company in a shape where retail investors will be excited to invest in the stock, which, by the way, I think they will be. So, yeah, it's one of these companies where, like, it is a generationally weird enterprise. But when I look at this particular set of stories, I think they're basically doing the right thing. What do you think? Yeah, I mean, I think there's this interesting fallacy in the AI industry where it's like there will be only one winner, right? Everything is zero sum. If Open AI is having a bad month, it's because, you know, Anthropic is having a good month or Google Demide is having a good month and vice versa.
Starting point is 00:23:46 Like, their sort of growth comes at the expense of all the others. And I think that's a, that feeling is shared by among others, the executives of these companies. But I just don't think it's true. Like, I think that there are going to be a handful of companies that are just going to kind of rise and fall together, right? That if your models are in the sort of top tier, you are going to be fine as long as they stay in the top tier and the sort of rising tide of AI adoption will sort of lift all boats. That's more my feeling. Well, will this rising tide lift all AI podcast as well, do you think? I hope so.
Starting point is 00:24:23 I hope so. Me too. Well, we come back. It's time to take your medicine. Dr. Adam Rodman is here to tell us what's going on with AI and doctors. Kevin, is there a doctor in the house? There sure is, Casey. Today we are going to have a conversation with a doctor about AI and medicine, because this is an area where there has just been a lot happening recently,
Starting point is 00:25:15 and we needed someone qualified to come in and debrief us. Yeah, you know, as we've sort of, have looked across the landscape just over the past few months, we've seen company after company introduce their own product at the intersection of AI and medicine. There's Chad 2PT Health, chat QBT for clinicians, Amazon has something called Health AI, Microsoft has co-pilot health, and of course all the while, doctors are experimenting with this technology, and as best as we can tell, actually getting really excited about what they're seeing. Yeah, and this has been a huge change in my recent visits to doctors, which is that I now am having this.
Starting point is 00:25:50 series of conversations leading up to the visit with AI systems about what is going on. And so I am coming armed with what I believe to be good information about what is going on. And that allows me to sort of have a different, more elevated conversation with the doctor. And this is not just me. Like, people are increasingly turning to chatbots for medical information, according to some recent data, approximately a third of Americans report turning to AI for healthcare information. And companies are racing to respond to that demand by,
Starting point is 00:26:20 making better tools that are specifically designed for use in health care. So to help us make sense of the landscape for AI and medicine and health care, we've invited back to the show one of our favorite doctors, Adam Rodman. He is an internal medicine physician at Beth Israel Deaconess Medical Center and an assistant professor at Harvard Medical School. Yeah, we last talked to him in November of 2024. And since then, he's continued to study the way that people and AI interact in the healthcare space, and we have a lot of questions for him. Like, what should we do about your rash, Kevin?
Starting point is 00:26:54 Yes. So let's fork over our co-pays and bring in Dr. Adam Rodman. Dr. Adam Rodman, welcome back to Hard Fork. Oh, it is a pleasure to be here. Am I a friend of the show at this point? Well, let's see how this interview goes. You're at least a doctor of the show. You are our primary care physician. So when we last talked to you in late 2024, I think this was a moment where the medical community was starting to say, wait a minute, these AI models are getting pretty good at things like diagnostics, but I think a lot of the field was still kind of in wait-in-see mode.
Starting point is 00:27:33 Now, almost two years later, we have a lot of new tools and a lot of new studies about the use of AI in medicine. So just catch us up on, like, what has been going on with AI in medicine for the last call year and a half? Yeah, it's been crazy. AI and medicine has gone from,
Starting point is 00:27:52 well, depending on how you, measured, it's probably the fastest adopted medical technology of all time. We went from this being super novel, almost no one used AI tools to this being a routine part of most doctors' weekly practice. And give us a sense of like the AI stack for a doctor. What are the tools that they are using right now and how? And particularly, like what are the mainstream doctors? Like the people that, you know, aren't yet on the bleeding edge? The normies? Yeah, if you will. Yeah. So the biggest sort of normal doctor technology, which most of your listeners are a good portion of your listeners have encountered or what are called AI Scribes. That's a sort of voice-to-text algorithm that
Starting point is 00:28:31 listens to you, talk to your patients, and then writes a first draft of your note. And these have gone from like kind of a novel experimental technology to commodity in probably less than two years. They're everywhere. Doctors really like them and then patients really like them because they spend more time like talking. And then the second sort of, I'd say, normal doctor use case is for decision support. So there's this one company called Open Evidence that has created a free tool that has gone from, again, zero to crazy numbers of adoption. I will tell you, younger doctors like my residents, use it all the time. I don't know the actual numbers, but it's probably close to half of U.S. doctors are using this right now.
Starting point is 00:29:15 Wow. So, yeah, the statistics that I've seen are that more than 40% of doctors now are using this, which is pretty crazy uptake for something that was just started a couple of years ago back in 2022. In March, Open Evidence reported that in a single 24-hour period, doctors consulted the AI system a million times. I've been fascinated by Open Evidence. I've never used it myself, but I have friends who are doctors or nurses, and they have said what you've said, that basically just everyone, especially on the younger end of medicine, is just using this thing constantly. So, like, give us a sense of how this open evidence tool works, what situations is it used for,
Starting point is 00:29:56 and what are its strengths and weaknesses? Oh, that's a great question. So how open evidence works, all of these tools is a trade secret, but it uses some sort of retrieval augmented generation and an evidence retrieval tool. And they have all these deals with the big medical journal. So New England Journal of Medicine, JAMA.
Starting point is 00:30:13 And when you ask a clinical query, it searches the evidence and then tries to identify high-quality sources, and then it always grounds what's coming back in the literature. So you have gray hairs like me who kind of use open evidence the way that I would use a Google search or one of the old tools. So I use it as a souped up way to search the literature rapidly and often go to the primary sources or I use it as a faster way to get a reference.
Starting point is 00:30:39 So a drug that I haven't dosed in a long time, open evidence pulls the drug monographs from the FDA. I can very quickly pull that up. Younger doctors, I have noticed, and I don't. don't know this empirically, but younger doctors are more likely to ask questions like, what could be going on? Can you give me a second opinion? What is the next thing that I should do? So ways that I don't traditionally use decision support or reference tools, but sort of a new way. And of course, younger doctors also use it in the reference ways that I do.
Starting point is 00:31:11 Now, are they actually uploading patient data to this? Or are they just sort of describing patients in generic and anonymized ways to get back some decision support? My understanding is largely number two. I'm sure the company has a good sense on how many people. I hope no one is copying protected health information and putting into it. Certainly what I've observed from like my colleagues and my students, most people use it the way you would use like a search tool when you have a question, which is, hey, I, like, I'm giving this person's suffraxone. What's the right dose for an intra-abdominal infection? It's a sort of generic question. that are being interpreted through the physician.
Starting point is 00:31:50 And are there any AI tools that are integrated with patient health records? This has been an area where I think there's just been a lot of pushback of like, I don't want my personal health data, my protected health data, going into one of these cloud-based AI systems. But are there hospital systems or medical systems that are bringing this stuff directly into contact with patient data? Oh, 100%. Yes. Right now, most of the sort of in contact with patient data,
Starting point is 00:32:17 are less about physician-facing decisions and more about like billing. There are companies that are like integrating with the electronic health record. Those are not standard yet. And then the EHR companies themselves. So like Epic is obviously the biggest EHR vendor in the U.S. They're doing a lot of work on building in native things. So for example, at my health system, if I want to send a message to the patient, the helpful AI at the top already has like, maybe you should say this.
Starting point is 00:32:44 It's usually not that helpful. And I don't think I've used it once in my life. But there are a lot of those things that are being experimented on actually built into the patients' health data. I'm curious how doctors are feeling about all of this. We saw a survey from the American Medical Association that found that more than 80% of physicians now report using AI professionally. Is that physicians racing out and grabbing these tools and bringing them to the office because they're so helpful? Or is this the classic case of a CEO saying, hey, you got to use AI or you're out of here? So doctors are B.Y.O. AI. A lot of that AI use is AI scribes and a decision support software.
Starting point is 00:33:24 And I'll tell you, some people are just using straight up like chat GPT or Gemini or Claude for the decision support software. So I think one of the reasons doctors thus far have been more positive about it than perhaps the overall population is they're largely tools that doctors are bringing themselves that they think make their lives better. And at least not yet many things that are being imposed upon us. Yeah. I've noticed that when, When I and my friends go to the doctor now, we often are presenting our information to a chatbot first and then coming into the doctor with sort of a readout of what the chatbot has told us. This is, of course, not a new phenomenon. People have been doing this with like WebMD results for many years. But is this something that you're seeing now is that many more patients are coming to you, having already discussed whatever's going on for them with a chatbot?
Starting point is 00:34:11 Yes. This is the other big changes is that there's, you know, there's someone else in the exam room with me. and often it's chat GPT. They're talking, sometimes with my hospitalized in patients, they're talking to chat GPT while I'm in the room with them. And I think it's interesting because this is kind of a new competency for doctors. We have to talk to our patients about AI. And I have started to talk to my patients about what I think are like safe uses,
Starting point is 00:34:34 what are like safe uses while telling me and then things that they definitely shouldn't do. My patients may talk to me more about it because I am like a doctor and an AI researcher, but like a lot of my patients are using AI routinely. Well, give us a flavor of what you're telling them because, you know, I'm definitely somebody who has looked up by symptoms before I've gone to the doctor. And I would say I found it enormously helpful. But I can also imagine, you know, more skeptical doctors being annoyed, you know, at a patient telling them, you know, what chat CBT says to do. So yeah, so here's my, I'll give you my spiel. This is, I give them a, what is it, a green light, yellow light, red light. So the green light uses are general health questions. So, so. I have recently diagnosed with diabetes. I really love seafood. Can you help come up with a diabetic diet for me? The green light uses are also preparing for clinic visits.
Starting point is 00:35:27 So I'm about to go see Dr. Rodman. I want to make sure that I ask the right questions. Here is the last note or the last thing he wrote. Obviously strip out anything identifying. Don't put your personal health information. And help come up with a good questions to ask him. And then other green light activities might be like wearable data. I don't know how good they are wearable data,
Starting point is 00:35:48 but I will tell you if a patient is going to give me like five years of their Apple Watch data, they're probably going to get better from chat GPT than for me pretending to look at five years of Apple Watch data because it's a 20-minute visit. The yellow light, Casey, I think, is a lot of the things that you're saying. So I tell my patients it's okay to explore new symptoms. It's even okay to seek out second opinions
Starting point is 00:36:10 when talking to a chat bot that can really help prepare you. As long as you understand that is not a report, for a doctor, and that is the first step to talking to a human being. So LLMs are really powerful, and I mean, there is some evidence, of course, that, like, when any human uses them, you don't always get laboratory-level performance. Like, they can give you dangerous advice. But diagnosis and, like, exploring symptoms, as long as you use it in a way to prep to see your doctor can be very helpful.
Starting point is 00:36:37 The red light, what I tell them never to do, is, like, ask medical management decisions. Like, don't say, my doctor said to do this. Is this right? Like, I have cancer. God forbid you have cancer. Is this the right chemotherapy option? Like, a lot of those decisions are so nuanced, seeking so much information.
Starting point is 00:36:54 Those are things that the models don't do well. And they're so sycophantic, they can convince you that they're saying the right thing even when they're wrong. Yeah. I'm curious, Adam, out here in San Francisco, there are all these fitness people and health maxers, people who love to track themselves using all manner of devices and people are getting these full-body workups from companies like function health that
Starting point is 00:37:18 are, you know, sort of concierge medicine things, and they'll get, you know, 100 labs done, and then they'll upload all that data into Claude or ChatGPT and just sort of treat it as a sort of first-line medical professional in their lives. Do you think that is a good practice, or is that just making people, you know, way too worried about things that maybe they don't need to be worried about? Yeah. So that's making people way too worried about things they don't need to worry about it. And this is chat GPT. LLMs in general. I mean, the dark side of talking to an LLM about your symptoms is they are so sycophantic, they can drive you into like the cybercrondria worry hole. The evidence is not there yet that the sort of large routine testing functional
Starting point is 00:37:58 medicine and putting it into an LLM does anything to improve health outcomes. Now, if your LLM is telling you to work out and eat healthier, that's probably pretty good. Sleep. Yeah. What about the integrations like ChatGPD Health, which lets you sort of convert your Apple Watch or Fitbit data into something that ChatGPT can analyze. There's also a new version of ChatGPT for clinical use called ChatGPT for clinicians. Are any of these integrations or projects more promising in your view? Not yet, but I think it could be at some point. I mean, so ChatGPT for health pulls in your data from the medical record and lets you chat with you chat. your medical records. Now, reason number one for concern is privacy. That's obviously going to have
Starting point is 00:38:46 your entire medical history going to an AI company. It's also going to not be redacted by you in a way to remove identifiable things. Reason number two, I think if we're talking about health record data, it's really messy. They include tabular data. They include copy forwarded data that's been copied and pasted. And they also, if you've ever read your health records, they include things that are wrong. there's a lot of errors or misdocumented things in your health data. And it turns out that just copying a bunch of information, like, LLMs aren't magical. You can't just copy your entire medical record in
Starting point is 00:39:21 and think that you're going to get good performance. And I would never bet against the technology. I think that we will get to the point that we have ways to build representations of humans and understand their health. But right now, there's like no advantage to just dumping everything in an LLM, which is what ChatGP2 for Health, theoretically would allow you to do in a way that would allow you to better understand your health.
Starting point is 00:39:43 I'm curious if you saw this trial they're doing in Utah where you can use an AI agent to autonomously renew prescriptions for almost 200 routine drugs. Yes. There's apparently some human review, but mostly this is automated. Is that good idea, bad idea? Well, so globally, no, we should not be having LLM's right prescriptions for people. The trial in Utah in particular is a refurb. So a doctor has already written prescription within the last 12 months.
Starting point is 00:40:13 And I guess the idea is that it saves the primary care doctor time from having to review and refill. I'll tell you, if you talk to most doctors, yes, it is annoying to get refill requests. No, that is not the thing that drives us crazy. This is not like a use case that we're screaming for. I think it's being done as a proof of concept of can this work in the real world. This trial in and of itself is not dangerous. Prescription refills, and I think there's no opiates. There's no dangerous drugs in it, and a doctor has to have written the original one.
Starting point is 00:40:41 But even if it does work in this, that does not mean we should be having autonomous AI systems, right? New prescriptions. That is not safe, and it's not a good idea yet. See, I think this is a case where, like, this is sort of rent-seeking behavior on the part of doctors or doctor organizations. Like, when I have gotten refills for prescriptions, I meet with a doctor for, you know, six to eight minutes. They say, how's it working? I say, great. They say, are you having any side effects? I say, no. They say, okay, I'll write you a refill. And the whole process just seems totally designed to, like, get me to pay up for another doctor visit and not give me any actual
Starting point is 00:41:26 good medical advice. So if I can play devil's advocate, like, do you think that there, that the sort resistance to programs like this are motivated by just wanting to keep people coming to the doctor and paying for those visits? So first, aren't most of your prescription refills just done as in you call the pharmacy and they send an automated thing to your doctor and they click the yes button and you never talk to them? No, for some, they make you actually do an office visit and maybe they want you to take your blood pressure again or whatever. So I'll do the devil's advocate back. Let's say I prescribe a fairly common antidepressant and they wanted to be refilled. What I don't know is that this patient may be, the silly question you get in the clinic,
Starting point is 00:42:07 may be new lesions forming in your mouth. And it's an early ulcer. And if we don't pick it up within 24 to 48 hours, you may develop like Stephen Johnson syndrome, so potentially life-threatening complication. And the reason there are certain types of drugs, including anti-hypertensives, is that they can be high risk and we need follow-up. Now, is that everything? No. And definitely there should be more things over the counter. I don't think that most doctors are sitting around saying, I wish I had more medication follow-up visits. And the reason some of these things exist is that there can be very dangerous symptoms. Yeah, so keep going to the doctor, Kevin.
Starting point is 00:42:41 We can't have you developing those lesions. It's too important to the show. Let me ask you about another one. This one actually seemed like just an unqualified good. The Mayo Clinic announced this week RedMod, this AI system that identified subtle changes in routine CT scans up to three years before a pancreatic cancer diagnosis. And this was like many, many, many percentage points better in detecting pancreatic cancer than human beings. So to me, this is like the sort of thing I keep waiting for AI to do.
Starting point is 00:43:12 And it seems like it's actually doing it. And of course, that's very exciting with something like pancreatic cancer, which is notoriously difficult to detect and has like very low survival rates. Yeah. And this is so like completely out of the discourse of like autonomous AI agents, there's really exciting stuff happening. So the Mayo Clinic, there have been some great studies on breast cancer detection. a lot of these algorithms have gotten so good that they're able to identify breast cancer better than, I shouldn't say better than people,
Starting point is 00:43:37 but in a workflow that has a good detection rate and then in picking up potentially cancerous polyps when you get a colonoscopy, so there's a lot of exciting and really positive things that are coming. And I mean, at the end of the day, we'll need to see how RedMod works in the real world and a trial,
Starting point is 00:43:54 but I'm really optimistic about that sort of technology. Do you think that if AI meaningfully extends life expectancy for people. It will be because of new AI discovered drugs or because of changes to routine health care that are made more efficient or more accurate by AI. Number two, I think that when you talk about AI drug discovery, the part of the pipeline that's so difficult
Starting point is 00:44:20 is not necessarily the coming up with the new compounds. It's running the clinical trials and getting it through the regulatory process, which can probably be sped up, but not as much as the discovery. you know, if we get this right, there's so many people in the U.S. who don't have access to a doctor, who don't have access to very basic medications, who can't control their diabetes because of lack of access. And I'm really hopeful that if we do this wisely, we can, you know, get people more access to care,
Starting point is 00:44:48 which, I'm doing my knock on wood, hopefully will improve health outcomes. So like all of this, I think the potential benefits are like less exciting. They're getting people, more people, the bread and butter. and getting, you know, more people to have less heart attacks, more people to have less strokes, more people to get their cancer screening, and not necessarily like, oh, we cure aging with some sort of new AI-discovered CRISPR technology. Are you at all surprised, though,
Starting point is 00:45:12 that we haven't yet seen the first, like, AI-discovered wonder drug? I mean, the biggest wonder drug of my career has been the GLP-1s, which was, I started using it when I was a resident, so we had it for a really long time, and we had to, like, repurpose a drug for diabetes. So, no, I'm not. Like, medicine and science is just kind of messy, and it's, there are always those stories about, like, you know, we discover something amazing and makes, like, penicillin. But even penicillin took, like, 20 years to get into human beings.
Starting point is 00:45:42 So, no, I think we will see AI discover drugs. I think it's just the benefits from AI are going to be, like, the benefits from medicine. It'll be a lot less exciting than people think, but still important. So there's a lot of worry right now about sort of, sort of, AI in schools, in education, some of the cognitive atrophy that people are worried about, oh, if we start using AI to do all of our work, we're not going to have the basic skills. Is that something you're worried about for, like, recent medical school graduates where maybe they would have had to hold all this stuff in their brains a few years ago,
Starting point is 00:46:17 and now they can just ask a chatbot, and maybe that's going to erode some of their skills as a physician? Yes. So that is the biggest worry that I actually have about sort of the short to me. medium term is de-skilling of the workforce. We have some evidence. There was a sort of scary study last year from Poland on a trial where they gave doctors, not a language model, but a polyp-detecting technology. And they looked at their ability to detect polyps, so potentially cancerous lesions in the colon before using it, and then after using it for three months. And when not using it, their ability to detect polyps dropped by six percentage points. So these are skilled doctors
Starting point is 00:46:52 using a technology and they lose six absolute percentage points of their ability to detect potentially cancer in three months. And then imagine that you're learning to do it for the first time. Will you ever gain those skills? So like at Harvard Medical School,
Starting point is 00:47:06 like this is, and medical schools, I think, everywhere, this is our big worry, which is how will this affect us to train the new generation of doctors? And it's like every other field. Like you talk about debugging code. In order to become a new doctor,
Starting point is 00:47:19 you go through all this training because you need to make mistakes and you need to have someone above you who knows what's going on so those mistakes won't hurt patients. And that's just how education works. And this threatens that. I mean, it's interesting, though, because it's like, you know, it's probably true that because I had access to a graphing calculator, like, if you took it away from me, I'd be worse at, like, plotting parabolas on a graph. But the solution to that is that I just keep using the calculator, you know?
Starting point is 00:47:46 So, like, I'm not sure how big of a problem this really is. I'll also say there's something, there's something deeply engaged. in human society that middle-aged people complain about young people. So I think whatever we talk about deskilling, we have to keep that in mind. Yeah, I mean, for what it's worth, like, I want my doctors to be using AI models. I want them to be consulting the hive mind before they weigh in on my specific condition. It doesn't threaten me as a patient to know that they are using open evidence or something similar. But I'm guessing for a lot of people, that would seem strange. And maybe there are some physicians who don't advertise how much they're using AI because their patients might think
Starting point is 00:48:25 less of them. Do you think that's happening? Oh, yeah. I think they're in certain situations, in certain places, I bet there's social pressure to say that you're not using AI, that there's some ego on the line. I don't see that, but again, I'm an AI researcher, so I don't think anyone would say that to me. To me, it just seems weird to, like, to hold as your standard for what makes a good doctor that they have memorized, like, a maximum amount of material. That's basically what we're talking about. It's sort of like, you know, the taxi drivers in London that have to like learn every single street
Starting point is 00:48:55 and like hold them all in their heads. It's very impressive, but I'm fine with them using the GPS. Yeah, and I think it's less about, so it is about memorizing. It's more at this point, right, with where AI is now, it's more having sort of that knowledge and we'll call it wisdom
Starting point is 00:49:10 to know when the system might be suggesting something wrong, which is something that right now, and this may change, we get by seeing a lot of cases and reflecting on them. So right now, you're going to get the best performance if you have an experienced human trained in the old-fashioned way with an AI system. But I think your guy's point is at some point that might not matter.
Starting point is 00:49:28 The AI systems might just outperform all of us. And then, yeah, I guess it's like just use the graphing calculator. But we're not there yet. Would the AI models be better if we were less protective of privacy for medical data? I mean, that's such a loaded question. So the first thing that I'm going to say before I answer that is patient privacy is very important. and we should respect people's privacy and their ownership over their data.
Starting point is 00:49:52 But yeah, so in short, like the reason they're not better at certain things is that you need to get LLMs better, you need to label and then train them on the sort of labeled health data. And in the U.S., there are appropriately many restrictions on how health data can be used. I suspect that these companies like OpenAI by having CHAPT for Health, they will gain some more of their own data,
Starting point is 00:50:17 which they say they're not going to train on. I trust that they're not going to train on it, but they'll be able to use that data to at least evaluate their models and try to make them better. I think they should train on it. I mean, obviously, that'd be a huge illegal violation of privacy. But it would also make the AI doctors better.
Starting point is 00:50:33 Yeah, much better. And I think, you know, a lot of people would be sort of willing to make that trade-off. So I at least think there should be a little checkbox when you go to the doctor that says, like, I'm okay having my personal health data used to train AI models. I for one would check it. In exchange for like 30% off your giant medical bill.
Starting point is 00:50:50 You get a coupon. You get a coupon, you know. You get like your next OZempic shot is 20% off. It's on the house. Exactly. Well, that's a good place to leave it. Dr. Adam Romaman, thanks so much for coming back and keep us posted on what is going on in medicine.
Starting point is 00:51:05 My pleasure, guys. Thank you very much. Thanks, Doc. When we come back, we'll talk about talkie, an LLM trained only on data from before 1930. Well, Casey, usually on this show we are talking about the future, but today we are going to take a trip back to the past, specifically to the year 1930. What was happening in the year 1930? Oh, my goodness.
Starting point is 00:51:52 Well, of course, we were in the middle of the Great Depression. My grandmother had recently turned 11 and was looking forward to getting her first store-bought dress in just a few years. So this is a new language model, a vintage LLM, called. Talky, and it is trained exclusively on data from before 1931. This is a research project built by three guys, David Duveno, Nick Levine, and Alecord, the lead author of the GPT1 paper, former OpenAI researcher over there. And this is a fascinating project that has been burning up my timeline this week, because this is an experiment in what happens if you only feed a large language model data from before a certain cutoff.
Starting point is 00:52:42 Yeah, and obviously there are a lot of, you know, kind of character-based chatbots on the internet that will give you the experience of talking to somebody from the past. But what makes this project different is that they try to limit themselves to training data from that time and before. The hope was that it would avoid any kind of contamination from what came after. And as you'll hear, they have some really interesting and potentially useful ideas about what this kind of LLM might. one day be used for. KZ, you spend any time playing around this model? I have. I tried to ask it the most
Starting point is 00:53:13 1930s question I could think of, which was, say, what's the big idea? What did it say? It said, the big idea is to popularize. And I said, popularize what, fella? And it said, popularize a sport. And I said, I'm gay. So that's kind of where we let that one drop. And it said, gay, you're happy? Yes, exactly. It said, your heart must be light, sir. Yeah, I love this experiment. I love, like, weird niche language models. One of my favorite language models of all time was Golden Gate Claude, which was the special version of Claude that was, like, pathologically obsessed with the Golden Gate Bridge. I would put Taki in sort of that category of, like, an experimental research model that is maybe not all that useful on its own, but, like, helps illuminate something interesting and important about these language models.
Starting point is 00:54:06 and what happens when you train them in specific ways. So today we wanted to talk with one of the creators of Taki. We are going to bring in David Duvino. David is an associate professor at the University of Toronto who researches AGI governance and catastrophic risk mitigation, and he is one of the co-creators of Taki. And there's really Dovano better person to talk to about it. That's true.
Starting point is 00:54:37 David Duvino, welcome to Hard For. Thank you very much, Kevin. So this project, Taki, is fascinating. It is a vintage LLM. Explain why you and Nick and Alec made this thing. So this all started a year ago. Was me and Nick were interested in forecasting? Like specifically, can we teach machines how to forecast like five or ten years ahead of time?
Starting point is 00:55:00 Like, what is the big picture going to be? Just because we have our own sort of like pet ideas about what the future is going to be. We don't think people should take our word for it. And we also don't think that people should trust machine forecasts unless they have a track record going back like decades and decades. So the idea here is that if we could build a model who really only knew about the world up to a certain date, we could ask it to forecast like five or ten years ahead of time. Like ask it, what's the New York Times headline going to be five years for now? Or is there going to be another great war or something?
Starting point is 00:55:27 And we can iterate and see like what kinds of things are predictable. What does it take? Like how far out can things be foreseen? And then hopefully, eventually we'll have. machines that have like a hundred-year track record of forecasting. And then we can ask them, you know, in 2026, what do you think is going to happen, like, you know, two or four, eight years from now? And we'll have an idea of how much to trust those forecasts.
Starting point is 00:55:49 It's a fascinating idea, but it strikes me, it requires you to have, like, really good data. So in this case, really good pre-1930s data, I'm going to guess that was harder to obtain than just, you know, going out and, like, crawling Reddit or, you know, and everything else that the frontier models have actually. access to. So how did you face that challenge and where did you get this pre-1930s data? Yeah. So I should mention, you know, there's a ton of groups doing a ton of awesome archival work here. The first data set we got excited about was institutional books, which was Harvard Library, scanned like 1% of their entire collection. And so they had tons of data from like 1800s, early 1900s.
Starting point is 00:56:28 Like there's a whole bunch of different groups doing tons of work. It would take a long time to enumerate this. but, and also, like, OCR has just gotten a ton better, just in the last, like, six months even. And so there's always been lots of projects to, like, automatically digitally scan this data, but it just hasn't been very high quality until very recently. And I assume that part of the reason you chose the cutoff date of around 1930 is because that's when sort of works
Starting point is 00:56:56 become public domain, anything after that is copyrighted. Are there any other reasons you chose that specific point in time? No, that was entirely it, is that we wanted to make everything publicly available in open source, and 1930s is just the sort of most recent date that has almost zero legal headaches with releasing data or anything like that. So I've been fascinated by seeing like what people are trying with this model. People are having it make predictions, but also asking it about its favorite authors or its opinions of, you know, major historical figures. What have been the experiments that have been most interesting to you? Yeah, the fun things that I've seen people do is
Starting point is 00:57:37 I mean, a lot of people like to ask like, what's 2026 going to be like? And the model has sort of very philosophical answers about how like, well, we will have figured out that war is bad, will have much have like a much more peaceful civilization or sometimes it says like it's the end times. I mean, it's a very inconsistent model and it's not quite smart enough to really like, you know, think things through in a systematic way. It kind of just gives you vibes. Now, that brings up a sort interesting like wrinkle of like the kind of LLM this is because if you were to ask like a frontier model today to predict the future, it would not only be trying to guess a statistically likely sequence of words, right? It would also be like doing some reasoning. Taki is not doing that, right?
Starting point is 00:58:19 So that just sort of seems like by the way that it is built, we would expect it to be less good at forecasting as as the models we have today. Yeah, absolutely. This is a very baby steps model, the basic fine-tuning for reasoning and the scaffolds, like the super forecasting scaffolds that we know just improve anyone's reasoning, like, you know, think of the different distinct possibilities and assign them each sub-p probabilities. The model's just
Starting point is 00:58:41 not really smart enough to follow these kind of detailed multi-step instructions yet. So again, we just wanted to release the first thing that we did, but it's like a, there's a clear path to adding all these refinements. So you do plan to add reasoning as you go? Oh, absolutely. Yeah. Okay. People have also pointed out that the model
Starting point is 00:58:57 behind Taki seems to know about some things that it probably shouldn't know about, like the rise of Hitler, the presidency of FDR, things that didn't happen until after its data cut off. Is that proof that there's been some kind of contamination of the training data with more recent data? Oh, there's definitely contamination, and this is sort of like one of the ongoing things that we're going to have to just keep revisiting again and again and refining. So we have a classifier that tries to look for things that are anachronistic. And especially if you want to use this for forecasting or to evaluate forecasting, it's really important that we really nail this issue. So we have all sorts of ideas
Starting point is 00:59:31 for canaries and things that we think the model should just never assign any likelihood to. Think of, I don't know, Nagasaki and Hiroshima. Like before World War II, like those two towns would just never show up in the same sentence ever almost, except for some weird coincidence. So you can just tell whether there's been leakage about important events if the model just thinks that there's any chance that you'll see those particular names together. So this, anyway, Like, we've done, we've made a bunch of efforts to avoid leakage. We know there's leakage right now. So you shouldn't use it to evaluate your forecasting scaffold yet.
Starting point is 01:00:04 But like, how is it getting that data if it's only being fed, scanned, OCR books from archival sources? Because archival sources have wrong dates in them all the time. Or it's kind of unclear what the, like, date of a text is because there's like an updated edition. Or sometimes there's like a preface that's been added later. Or sometimes even just in the middle of the text, there'll be like, someone inserted some future notice like, you know, note. And like later on, like historians note, da-da-da-da. And so it's just really hard to check all these little edits that people make,
Starting point is 01:00:38 and then they still maintain the original publication date on the metadata. I see. I asked Taki what it knew about me, and it said, Kevin O'Hara, which is not my name, was born in Dublin in 1840, and having been educated at the School of the Christian Brothers, became a teacher in it. He afterwards adopted the profession of journalism and was for some years connected with the staff of the nation newspaper. It also said I had written several popular songs, including Molly Astor and the Irish immigrant. Now, obviously, most of that is wrong, but it did
Starting point is 01:01:12 connect me to journalism, which I found interesting and maybe like some other evidence of some data contamination. But like, is this thing accessing the internet in some way? Or like, how would it have known that I, or at least Kevin O'Hara, this character sort of connected to me in the model, was a journalist. You know, that's a great question. I guess I'll say the training data was like 240 billion tokens. And it's just, this is sort of like this vast ocean of stuff. So, like, maybe there was a list of journalists that got put in somewhere that had your name in it.
Starting point is 01:01:45 I mean, I guess one thing about this model is it hallucinates like crazy. And, you know, this was a huge problem with the chatbots that people were meant to use professionally. and I think it's been addressed to a large extent in like frontier models, but we made zero effort to address that in any of our post-training so far. Kevin, would you sing a few bars of the Irish immigrant for us? You know, I don't want to waste it with time here. All right. We'll save that for later.
Starting point is 01:02:09 Speaking of problematic content, some people found that Taki gives racist responses to questions that are basically like, you know, would you let a black professor teach your child? I can see how that might be historically accurate, it, but I'm curious if you anticipated it and how you feel about it. Yeah, so it was also very clear to us that it had these kind of responses. I mean, I guess I'm a professor myself, and my sort of first instinct is, like, let's let people see this if they want to, and just don't surprise anyone and don't be flippant about it,
Starting point is 01:02:40 because, you know, it really can be upsetting to some people, and especially if we just, like, treat it sort of insubstantly. So the way we threaded the needle was we, we, we, did zero like filtering of the data set for like problematic content. We want to just like show what the actual sort of state of knowledge or state of thought was in the past. It would defeat the purpose of the project if we put our thumb on the scale. But for the public demo where you can talk to it talking, we just had a modern model with like modern sensibilities. Just read every response.
Starting point is 01:03:13 And at the end, once it's generated, if it is deemed problematic, just like slap a warning and say like, oh, this might have something upsetting. just like click if you want to see it. Right. The description I loved, this came from Gavin Leach today, was that Taki is creating beautiful prose by a terrible person, which is consistent with some of my tests, which is like, this thing actually does write quite well,
Starting point is 01:03:39 and actually, to my ear, like, much more literary than some of the more recent models trained on more recent data. But yeah, it is not, it is clearly the product of its time or at least the time of its data. Yeah, yeah. And I mean, the prose is really cool because it's like very refreshing style. And actually if you feed it to one of the AI detectors, it usually says like 100% human, which is kind of funny.
Starting point is 01:04:04 But then I guess as you mentioned, like a terrible person, I mean, right now it's kind of ends up being a sort of like average person. And depending on like, it'll just randomly answer it with all sorts of different voices. But that's one of the next things we're planning to work on is helping you talk to more specific people or in specific sort of states of knowledge or times and places, because that I think allows you to answer more coherent questions than just talk to like the hive mind of 1930 or whatever. Speaking of the hive mind, I saw another person ask Taki,
Starting point is 01:04:33 basically the person told it that it was from the future and would tell Taki anything it wanted to know about the future. And Taki's first question was, how did universal peace come about? Which was like the most heartbreaking thing I think I've ever read from a large language model. Like, what does that tell us about the time period or about the training data? Well, I guess I'll say in general, futurism is a place where people don't actually often try to predict the future very hard.
Starting point is 01:05:01 And they more, they kind of project their values. And if you ask someone, what do you think is going to happen? They usually fill in something about what they hope is going to happen. And I think that was also true 100 years ago. So the trick is to get talking out of the, like, wishful thinking mode and actually like into brass tacks. So like, what do you actually think is going to happen? It's just funny because I think if like, you know, know, if I could talk to an LLM from, you know, what, you know, 21, 26 today, I would just sort of be like,
Starting point is 01:05:27 are the humans still alive? Like, what's going on with the climate? Did the robots, you know, how many people did the robots kill? It would just sort of be a very different set of questions than Taki seemingly wanted to know. Now, are you going to point Taki at any big sort of scientific discoveries and see whether it can make them? I mean, Demosis is. Tabas at Google DeepMind has this sort of theory that AGI should be able to discover Einstein's theory of relativity if you just give it all of the pre-existing scientific literature at the time. Are you hoping to use this model or a descendant of this model for anything like that? Absolutely, yeah.
Starting point is 01:06:08 So one of the, like, I think Nick especially is interested in this question of given a state of knowledge, sort of how much would it take to, like, how far ahead can you just from pure reasoning advanced your state of conceptual understanding. And the classic examples are like, you know, some of Einstein's discoveries, which really didn't require experiments. They just required putting the pieces together. And there's actually another project called Machinist Miriablus,
Starting point is 01:06:31 who that took a training cut off of 1900 and tried to get it to see if they could rediscover special relativity. I mean, the thing is that the models that those people did those experiments on were like, I think, three billion parameters, just not smart enough to do very much. So he showed that you could hold his hand at a certain point, we kind of get gesture in the right direction, but to do the kind of like systematic reasoning and math
Starting point is 01:06:53 that Einstein had to do, we probably need another, let's say, like 10X in parameters at least. David, what are you building next? Are you going to build a bigger version of Taki and keep trying to get it to perform better? Yeah, so there's a few things that we want to do. So, like, obviously making the models bigger. And so, you know, right now the model is still smaller than GPT3 was,
Starting point is 01:07:16 although bigger than GPD2. So, you know, Alec kind of says that there's a bit of a phase change around like 100 billion, 150 billion parameters where the model starts to be smart enough to actually have a back-and-forth conversation with. Obviously, scaling up the data set and OCR efforts, and like right now everything is just mostly English just because we can evaluate, you know, we can quality check the English text because we're all native English speakers, but we want to obviously broaden this repertoire. Like working on the filtering is obviously a big one.
Starting point is 01:07:45 And then obviously there's all, I mentioned all this like. how to even evaluate the forecasting ability? That's like another big question. If you put the model in a robot, would that be a walkie-talkie? You can ignore him. Anyways, I'm going to go. Well, David, fascinating experiment, and people can go try talkie for themselves. It's at tocky-lm.com.
Starting point is 01:08:11 What's a good goodbye to a podcast guest? Don't worry about what a podcast is. Okay, well, as Taki says, a pleasant journey to you, sir. Thank you very much, sir. Thank you, David. Fork is produced by Whitney Jones and Rachel Cohn. We're edited by Viren Pavich.
Starting point is 01:09:00 We're fact-checked by Caitlin Love. Today's show was engineered by Daniel Ramirez. Original music by Marian Lazzano, Diane Wong, Roan Nemistow and Dan Powell. Video production by Soyer Roque and Chris Schott. You can watch this whole episode on YouTube at YouTube.com slash Hartforke. Special thanks to Paula Schumann, Puiwing, Tam, and Dahlia Hadad.
Starting point is 01:09:22 You can email us at HeartFork at NYTimes.com with your most recent diagnosis. We'll tell you if we think you should get a looked at.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.