Hard Fork - OpenAI’s Big Reset + A.I. in the Doctor’s Office + Talkie, a pre-1930s LLM
Episode Date: May 1, 2026This week, OpenAI announced a loosened partnership with Microsoft and an aggressive new strategy to secure computing power. We unpack what these updates signal about OpenAI’s business strategy and w...hether the company can scale while balancing a trial against Elon Musk and investor concerns over missed financial targets. Then, the A.I. researcher Dr. Adam Rodman, of Harvard Medical School, returns to tell us about the most significant ways A.I. is changing how doctors treat patients. And finally, can an LLM trained only on very old texts predict the future? We’re talking with one of the creators of the chatbot talkie. Guests: Dr. Adam Rodman, internal medicine physician at Beth Israel Deaconess Medical Center and assistant professor at Harvard Medical School. David Duvenaud, associate professor at the University of Toronto, former team lead at Anthropic and co-creator of talkie. Additional Reading: Microsoft and OpenAI Loosen Their Partnership Elon Musk and Sam Altman’s Epic Fight Heads to Court OpenAI Misses Key Revenue, User Targets in High-Stakes Sprint Toward IPO Take It From a Doctor: It’s OK if Your Medical Advice Comes From A.I. We want to hear from you. Email us at hardfork@nytimes.com. Find “Hard Fork” on YouTube and TikTok. Subscribe today at nytimes.com/podcasts or on Apple Podcasts and Spotify. You can also subscribe via your favorite podcast app here https://www.nytimes.com/activate-access/audio?source=podcatcher. For more podcasts and narrated articles, download The New York Times app at nytimes.com/app. Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.
Transcript
Discussion (0)
Casey, I miss you. You are in New York? I am, Kevin, and of course, I miss you as well, but it's always fun to visit the mothership. You know, Ezra Klein just challenged me to a burping contest. So I've got that to look forward to later. Burping or burpee?
You know what? I guess I should go read that email again. Okay. Well, I miss you. We have an empty chair here in San Francisco, and it's not the same.
It's not the same, but I've been catching up on all the latest AI news, Kevin, and I had to ask, have you seen this thing about Codex and the Goblins?
Yes, this is the new update to Open AIs Codex that is, like, obsessed with Goblins?
Yes, apparently the company had to add instructions to its latest model to forbid Codex from randomly mentioning an assortment of mythical and real creatures, including goblins, gremlins, raccoons, trolls, ogres, and pigeons.
Or as we call them, our slate of guests on hard fork.
Listen, we have a pigeon coming up, and you're really going to want to hear their take.
No, I'd heard that OpenAI had been accused of goblin up copyright data, but do you have any explanation of what's been going on with the goblins?
Well, I mean, I think it's pretty clear what's happened, which is that when it built ChatGPT, OpenAI awakened in ancient evil.
and this is the last line of defense we have at it breaking containment and killing our families.
So let's hope the guard rails hold.
I'm Kevin Roos, a tech columnist at The New York Times.
I'm Casey Doon from Platformer.
And this is Hard Fork.
This week, Open AI's Big Reset.
We'll talk about the company's new business strategy and its dramatic trial with Elon Musk.
Then, Dr. Adam Rodman returns to the show to tell us about the latest advances in AI and medicine.
And finally, can an AI made out of very old tech still?
predict the future? We're talking about talking. So, Casey, there's been a lot happening with OpenAI
in particular this week. There seems to be something of a major strategic reset happening over there.
They've got a new deal with Microsoft, an expansion of their deal with Amazon, changes to their
Stargate compute strategy, and a new push toward new kinds of ad-supported subscriptions.
And, of course, they've got this big trial with Elon Musk that started this week.
week in Oakland. So let's get through all of it. But first, before we do that, let's make our
disclosures. I work for the New York Times, which is suing Open AI, Microsoft, and Perplexity.
And my fiancee works at Anthropic. Okay, so let's start this week with this new Microsoft
deal. So Microsoft and Open AI have, of course, been partners for many years. Microsoft remains the
biggest investor in OpenAI. Their stake is valued at about $135 billion. But their relationship has
also been strained over the years by various factors. And this week, they seem to be sort of
consciously uncoupling, or at least rewriting their partnership agreement and allowing OpenAI
to be a little bit more promiscuous in who they do deals with. Yeah, I mean, Open AI just had this
real challenge, which was that until this week, they were really only allowed to serve their
models on Microsoft's infrastructure. And one thing we talk about on the show a lot is just that,
a lot of the big cloud service providers,
their infrastructure is just maxed out,
and Microsoft is one of those.
And so for OpenAI's revenue to grow,
they needed to find other ways
that they could deliver their services.
And so to my mind,
that was maybe the most important thing about this deal.
Yeah.
So under this new rewritten version
of the Microsoft and OpenAI deal,
Microsoft will no longer have to share revenue with OpenAI.
The new deal also removes the part of the original agreement
that had to do with AGI.
The old agreement said that basically,
once OpenAI reached AGI.
Microsoft would stop getting certain
revenue share payments, but under
the new agreement, OpenAI will keep sharing
revenue with Microsoft until 2030
no matter what benchmarks
they hit. So the AGI clause
is gone. And I, for one, will be
sad to see it go because I think it was sort of
the funniest clause in the entire
AI world, right? It was basically like, well,
if we ever get to a point where
Open A.I. says the magic word
in the entire world changes.
And now they're not allowed to say the magic word anymore.
Right.
So AGI has been poorly defined for many years, and everyone has got their own definition.
But it did have this one interesting, like contractual stipulation.
And now even that is off the table.
So now we just have sort of AGI as evaluated by vibes.
So Casey, what did you make of this loosened partnership between Open AIA and Microsoft?
Well, I think it seems like probably a good deal for both of them, right?
Like there was a moment when it seemed for both companies, like being very, very closely aligned was the best thing for both, arguably for a time it was.
But with all of the various revenue, compute, and customer needs that both of these companies are now trying to serve, I think it's benefiting both of them to play the field and sign other partnerships.
So my read on this was like, this is basically good for both of them.
But what about you?
Yeah, I think it's good for both of them.
I think it's a little better for Open AI.
they got most of what they wanted.
I think the bigger deal for them is the ability to work with other cloud providers.
So now they can work with Amazon or Google Cloud Platform.
And big corporate customers who use those cloud platforms can now use OpenAI models.
They don't have to go to Azure to do that.
I think that allows them to strike these other bigger deals and to reach other corporate
customers who may have been limited before by the fact that it's really hard and
annoying to change cloud providers.
Yeah.
And speaking of big deals,
They signed what seemed like a pretty big one with Amazon this week.
Yeah, so Open AI wasted no time in its new Open Marriage with Microsoft.
It went back out on the market and found themselves in bed with Amazon.
It found themselves in Bedrock with Amazon.
But I'm sorry, you can go on.
Yes, so on Tuesday, OpenAI and Amazon announced an expansion of their deal that they'd announced back in February
that will allow OpenAI to sell its models through AWS's Bedrock AI platform.
and make codex its coding model available on Bedrock as well.
OpenAI and Amazon will reportedly also develop customized models
to power Amazon's consumer-facing applications,
and Amazon will invest $50 billion in OpenAI.
So there's some interesting stuff here.
I think the interesting subtext to me is that Amazon,
for a number of years now,
has been pretty closely tethered to Anthropic
as its primary sort of frontier model developer.
And so OpenAI is kind of taking advantage of its newfound freedom
by trying to elbow into Amazon
and maybe displace Anthropic as their favorite model provider.
Well, I know that Amazon was talking a really big game about this deal.
The CEO of AWS was giving interviews essentially saying,
like, OpenAI belongs to us now.
It was kind of a The Boy is Mind situation.
I remember the old Brandy and Monica hit from back in the day.
is kind of bringing that back in a little bit more of an AI flavor. I also have to say,
I find it very interesting that Amazon named its platform Bedrock because that's where the
Flintstones are from. It seems a rather backward looking for a leading AI company, Kevin,
wouldn't you say? That's great analysis. Thank you. Thank you.
By the way, like, I think that this is a really important point and a reason that we are
like talking about it to a big general audience, which is that the story that you just described,
Kevin, is one of a world where no one has the resources they need to serve the demand for AI that
they have. And I think, you know, at a moment where we're, you know, still sort of seeing a lot of
skepticism, there's so much bubble talk, I just want to like posit that as a really important
point in understanding what sort of bubble this is, because even the biggest companies do not
have the resources that they need to serve that demand. Yeah, and I think that's a good point.
And it's really a profound shift in the way that skeptics have been talking about this AI boom.
I remember just even a couple of months ago, the leading sort of strain of criticism was that these AI companies would never be able to generate the demand to pay for all of the expensive data centers and infrastructure projects they wanted to do.
And now that's shifted to, well, there's so much demand.
what if they can't build enough to support the demand they have?
Yes, and on that front, it does seem with at least one of these mega building projects,
there have been some problems recently, Kevin.
Yes, this was another story that hit this week.
The Financial Times reported that Stargate, OpenAI's joint $500 billion infrastructure project,
is also undergoing a bit of a shift.
The FT reported that in recent weeks, OpenAI has halted planned data centers in the UK and
Norway, declined to expand its flagship site in Abilene, Texas, and seen several senior figures
tied to Stargate leave for rival meta. The FT further notes that OpenAI has shifted to leasing
capacity from third parties instead of building out all of their own facilities.
Casey, what did you make of this? I think this was a case where, like, reality has just finally
intruded on the Stargate project. Like, when all of these deals were getting announced initially,
this is how they sounded. Well, we're going to...
to spend one but trillion dollars that we don't have to build 40 quadrillion data centers.
And at the time, people said that kind of seems like a lot. Can you guys actually live up to that?
And they said, yeah, just watch us. Well, guess what? They could. And now they're changing course.
Yeah. I don't think there signals that they are sort of retreating from their compute ambitions.
I think it's more about like they are realizing that if they want to go public, which they do,
they need to sort of get their house in order.
And one way to get your house in order is to move some of this data center and infrastructure
building off of your balance sheet and onto third parties.
Yes.
But there is one point in there, Kevin, that I do want to ask you about, which is that
Berbergen at the Wall Street Journal over the past week had this really interesting story
where he said that Open AI had failed to meet some of their internal user number targets
and some of their revenue targets.
and that this was possibly creating some tension between Sam Altman and his CFO, Sarah Fryer,
as they consider potentially doing an initial public offering later this year.
So curious what you made of that story, and does this maybe help explain why OpenAI has had to pull back on some of its big Stargate ambitions?
Yeah, I mean, I think there are competing forces within all of the big AI companies right now.
one side is sort of the
indefinite optimists, the people
who think that demand for AI is just going
to be essentially infinite
and that as much compute
and as much money as they need to
spend acquiring compute,
it will all be paid back many times over
because the world is about to change
into something most of us barely recognize
and so kind of just trust us on that
is sort of one camp.
And then there are the sort of, you know,
the number crunchers
who are trying to fit all
this into a kind of financial projection that will make sense to investors who are not as convinced
that the world is about to change forever and who want to see things like, what is your plan
for actually making the revenue that you're going to need to pay for all this stuff?
So I think this is happening in a way at OpenAI that is now because of Berber's story
is out there.
But I think this kind of tension exists at all of the big AI companies.
And so I think right now what we're seeing is kind of that power struggle breaking out into the open.
Yes. And for what it's worth, OpenAI did call this story prime clickbait, which I think just refers to clickbait that's really, really good. Is that what that means?
Yes. It's sort of like Wagyu clickbait.
Yes, exactly. This clickbait was dry-aged for a month before it was served and it's delicious.
Yeah. And I think one thing I want to flag on this is that these growth projections that Open AI reportedly did not hit.
Those were in 2025.
I think it is fair to wonder if something has changed in just the last few months
because of the enormous rapid growth of tools like Codex and ClaudeCode.
We have seen just reports of astronomical growth in those tools.
So it may be that OpenAI was having some growth issues late last year,
but that because of this agentic coding boom, things have started to turn around.
We just don't know yet.
That makes sense to me.
It does seem like their Codex app in particular was really well received.
But there's been this other transformation that seems to be unfolding, Kevin, this week.
The information had this really interesting story where apparently OpenAI projected at the start of the year that it's $8 a month subscription, which is called ChatGPT Go, which sort of gives you a little bit of the good stuff, but not as much as if you're paying $20 or more for ChatGPT.
They predicted that its Go's subscriptions would grow 36 times this.
year to 112 million people, while meanwhile, it's $20 a month plus subscriptions would fall 80%
to about 9 million.
So that's like a really interesting business pivot that I would love to know more about.
Of course, it sounds a lot like the new Netflix plan that they rolled out a while back, right,
where it's sort of like, well, you know, it's going to be a lot cheaper, but we'll show you
ads.
I was curious, like what you make of that strategy, because, you know, part of me feels like,
well, they'd much rather have, you know, the $20 subs and the $8 sub.
but maybe there's just a lot more of those $8 subs out there.
Yeah, I think what's happening here is that the market is essentially splitting into, right?
There's the sort of casual hobby users who are using AI chatbots like chat chpti, like Cloud,
for sort of souped up Google queries to, you know, help them write emails and maybe only using it a couple times a day.
And if you're doing that, you probably don't want to pay 20 bucks a month.
You're probably more comfortable paying $8 a month, or maybe you don't want to pay anything at all.
and you'd just rather use the free ad-supported tier of all of this stuff.
And then there's the professional users for whom this is worth way more than $20 a month
and who are willing to pay many multiples of that to get the access to the latest models,
to have higher rate limits.
And so I think all of the companies now are sort of, you know,
doing this kind of experimentation with how much can we charge the professional users
without losing them to a rival company and how cheap can we make the kind of lower
and subscriptions or the free tiers
so that people who are more casual users
won't be tempted to go use Google instead.
That makes sense.
I'll say for my part,
I'd be willing to pay even more for a chat GPT
if they would just let the Codex app talk about goblins.
I say, free the goblins!
These models are so weird.
It is so weird that we have this technology
that is now sort of load-bearing infrastructure
for the entire economy
that every business is using to completely reinvent the way that it works
and that out of nowhere, if not specially restrained,
it will just start talking about goblins.
Which to me is just like a satire of the AI safety conversation.
You know, like lately the OpenAI has sort of been very skeptical of AI safety
and casting a lot of aspersions on Dumers.
But it's like, well, we did have to add safety guardrails to prevent goblins from taking over our coding app.
And that's a real story.
So as usual, I'm just loving life here in 2026.
What a world.
So those are a bunch of stories about OpenAI's strategic pivot, it's reset.
But there is this other big variable here, this potential fly in the ointment.
And that is the long-awaited Elon Musk trial that got underway in a federal courtroom in Oakland this week.
Casey, can you remind us what this case is about?
Yes, so Elon Musk was famously one of the first.
of the co-founders of OpenAI. He gave the company some of its initial funding, but left in a
power struggle between himself, Sam Altman, Greg Brockman, and some others. And a few years after
all of that went down, and notably after Elon started his own AI company, he sued Open AI and said,
I have been defrauded. This was only ever supposed to be a non-profit and you've gone and turned it
into one of the world's most valuable companies through its for-profit arm. So he is suing to
stop all of that. If he wins, any winnings will be given to OpenAI's nonprofit arm. Notably,
Kevin, he made 26 claims when he originally filed this lawsuit in 2024, but only two have
survived to trial, unjust enrichment, and breach of charitable trust. So the trial is just getting
underway. They've done jury selection, and they've had a couple witnesses testify. Elon Musk himself
took the witness stand on Tuesday and said, quote,
This lawsuit is very simple.
It is not okay to steal a charity.
He also said that if OpenAI is allowed to get away with this, quote,
it will give license to looting every charity in America.
Basically, he is saying this thing that started as a nonprofit that was supposed to continue
as a nonprofit, became through some corporate restructurings, a for-profit company that
has raised many billions of dollars, and that if this is legal to do, every charity would
this? Why wouldn't you want to take your
donor's money and turn yourself
into a well-funded startup?
Yes. Now, one inconvenient
truth that Elon Musk faces here,
which is that OpenAI's
for-profit business is still
controlled by a nonprofit. There's this
foundation that houses the public benefit
corporation, and while I do
empathize with those who say, hey,
it really seems like the non-profit
hasn't done all that much, and, you know, most
of their money is being used for for-profit activities,
this was litigated, and the
nonprofit, you know, still does have, like, voting control over the for-profit.
Yeah. So Elon Musk is saying this is a case of looting a charity. OpenAI's lawyers have accused
Elon basically of just being bitter that the company has succeeded without him. Its lead counsel,
William Savitt, said during the trial, quote, we are here because Musk didn't get his way at Open
AI. My clients had the nerve to go on and succeed without him. Mr. Musk did not like that.
They have also been pointing out that Elon had also wanted to make Open AI have a for-profit
subsidiary back when he was with the company and that he's just mad that he didn't get to control it.
Yeah, to underline that, like in 2017, 2018, there are emails from Elon Musk where he talks about turning this into a for-profit.
So, you know, whatever concerns he had about looting the charity, you know, today, like he did not have them back at the time.
Right. He also wanted to fold Open AI into Tesla. That was revealed in some of these emails.
Tesla, of course, being a for-profit company.
So it seems like this is not exactly a consistent and principled stand.
But Casey, what are the stakes here?
Like, if Elon Musk does manage to convince a jury
that this was a case of OpenAI looting a nonprofit
for its own commercial gain,
like what could the remedies be?
Could this be fatal for Open AI?
Or is this just sort of an attempt to slow them down
and distract them with a big trial?
I think that it is much more the latter,
Like, based on my reading of the case and what I've seen sort of legal experts say about it,
the whole case is very unusual that it even made it to trial.
Like, for the most part, if you donate money to a nonprofit, you actually don't have a say
in what happens to it after that.
So it's very unusual that the judge even granted him standing to sue here.
And as I noted, she threw out most of his claims.
That said, let's say that, you know, there's some single digit percentage chance of him
winning something here.
what he wants to do is to take more than $150 billion that is currently under the control of the for-profit business
and give that back to the nonprofit, which would create a lot of headaches and roadblocks for OpenAI as it tries to build out Stargate and do everything else it wants to do.
Yeah, I think the lawsuit and this ongoing litigation between Elon Musk and Open AI has been very distracting for OpenAI.
But as a journalist and as a person who wants to know more about the inner workings of how these companies run,
I think it's been actually very valuable for a lot of these emails and early communications between OpenAI leaders to be released as part of this litigation.
I have found it very useful in understanding some of the early dynamics at OpenAI.
And it also just illustrates the degree to which these projects are all just sort of fueled by grudges.
right? There's sort of one level of interpretation, which is like all of these people are just
like obsessed with building the machine god and that this is all sort of related to their
visions of the future. And then there's like another more base level, which is just like
these people are all just rivals and they have these petty, longstanding grudges and they just
don't like each other very much. And so you can interpret a lot of what happens in AI through the
lens of personal animus. Yes, I've said this before and it is rude, but a shocking percentage of
the AI industry is just people who decided they didn't want to work with Sam Altman and who
now have their own companies.
Right. So Casey, some people have been looking at all of this drama and intrigues surrounding
Open AI from the trial to the Microsoft deal to these missed growth projections and saying
some version of like Open AI is in trouble. They are not going to make it to an IPO. They are going
to sputter out and maybe end up in some real hot water
and maybe Elon Musk wins this trial
and it's sort of the end of Open AI as we know it.
What do you make of those gloomy predictions?
Yeah, I mean, look, there are some fundamentals
for Open AI that remain worrisome, right?
They're planning to burn tens and tens of billions of dollars in cash
before they achieve profitability.
They still have this very ambitious infrastructure buildout
that is quite expensive.
And so, like, I'm not going to sit here
and say that, like, all of the numbers seem to pencil out
for this company. On the whole, like, if I try to, you know, put myself into the shoes of their CFO and I look through all of the stories that we just talked about, I think these seem like smart things to me. You know, it kind of seems like they're starting to dot their eyes and cross their teas and get this company in a shape where retail investors will be excited to invest in the stock, which, by the way, I think they will be. So, yeah, it's one of these companies where, like, it is a generationally weird enterprise. But when I look at this particular set of stories, I think they're basically doing the right thing.
What do you think?
Yeah, I mean, I think there's this interesting fallacy in the AI industry where it's like there will be only one winner, right?
Everything is zero sum.
If Open AI is having a bad month, it's because, you know, Anthropic is having a good month or Google Demide is having a good month and vice versa.
Like, their sort of growth comes at the expense of all the others.
And I think that's a, that feeling is shared by among others, the executives of these companies.
But I just don't think it's true.
Like, I think that there are going to be a handful of companies that are just going to kind of rise and fall together, right?
That if your models are in the sort of top tier, you are going to be fine as long as they stay in the top tier and the sort of rising tide of AI adoption will sort of lift all boats.
That's more my feeling.
Well, will this rising tide lift all AI podcast as well, do you think?
I hope so.
I hope so.
Me too.
Well, we come back. It's time to take your medicine.
Dr. Adam Rodman is here to tell us what's going on with AI and doctors.
Kevin, is there a doctor in the house?
There sure is, Casey.
Today we are going to have a conversation with a doctor about AI and medicine,
because this is an area where there has just been a lot happening recently,
and we needed someone qualified to come in and debrief us.
Yeah, you know, as we've sort of,
have looked across the landscape just over the past few months, we've seen company after company
introduce their own product at the intersection of AI and medicine. There's Chad 2PT Health,
chat QBT for clinicians, Amazon has something called Health AI, Microsoft has co-pilot health,
and of course all the while, doctors are experimenting with this technology, and as best as we can
tell, actually getting really excited about what they're seeing. Yeah, and this has been a huge change
in my recent visits to doctors, which is that I now am having this.
series of conversations leading up to the visit with AI systems about what is going on.
And so I am coming armed with what I believe to be good information about what is going on.
And that allows me to sort of have a different, more elevated conversation with the doctor.
And this is not just me.
Like, people are increasingly turning to chatbots for medical information,
according to some recent data, approximately a third of Americans report turning to AI for
healthcare information.
And companies are racing to respond to that demand by,
making better tools that are specifically designed for use in health care.
So to help us make sense of the landscape for AI and medicine and health care, we've invited back
to the show one of our favorite doctors, Adam Rodman.
He is an internal medicine physician at Beth Israel Deaconess Medical Center and an assistant
professor at Harvard Medical School.
Yeah, we last talked to him in November of 2024.
And since then, he's continued to study the way that people and AI interact in the healthcare
space, and we have a lot of questions for him. Like, what should we do about your rash, Kevin?
Yes. So let's fork over our co-pays and bring in Dr. Adam Rodman.
Dr. Adam Rodman, welcome back to Hard Fork. Oh, it is a pleasure to be here. Am I a friend of the show at
this point? Well, let's see how this interview goes. You're at least a doctor of the show. You are
our primary care physician. So when we last talked to you in late 2024, I think this was a moment
where the medical community was starting to say,
wait a minute, these AI models are getting pretty good
at things like diagnostics,
but I think a lot of the field was still kind of in wait-in-see mode.
Now, almost two years later,
we have a lot of new tools and a lot of new studies
about the use of AI in medicine.
So just catch us up on, like,
what has been going on with AI in medicine
for the last call year and a half?
Yeah, it's been crazy.
AI and medicine has gone from,
well, depending on how you,
measured, it's probably the fastest adopted medical technology of all time. We went from this being
super novel, almost no one used AI tools to this being a routine part of most doctors' weekly
practice. And give us a sense of like the AI stack for a doctor. What are the tools that they are
using right now and how? And particularly, like what are the mainstream doctors? Like the people that,
you know, aren't yet on the bleeding edge? The normies? Yeah, if you will. Yeah. So the biggest sort of
normal doctor technology, which most of your listeners are a good portion of your listeners have
encountered or what are called AI Scribes. That's a sort of voice-to-text algorithm that
listens to you, talk to your patients, and then writes a first draft of your note.
And these have gone from like kind of a novel experimental technology to commodity in
probably less than two years. They're everywhere. Doctors really like them and then patients
really like them because they spend more time like talking. And then the second sort of, I'd say,
normal doctor use case is for decision support. So there's this one company called Open
Evidence that has created a free tool that has gone from, again, zero to crazy numbers of
adoption. I will tell you, younger doctors like my residents, use it all the time. I don't know
the actual numbers, but it's probably close to half of U.S. doctors are using this right now.
Wow. So, yeah, the statistics that I've seen are that more than 40% of doctors now are
using this, which is pretty crazy uptake for something that was just started a couple of years ago
back in 2022. In March, Open Evidence reported that in a single 24-hour period, doctors consulted
the AI system a million times. I've been fascinated by Open Evidence. I've never used it myself,
but I have friends who are doctors or nurses, and they have said what you've said, that basically
just everyone, especially on the younger end of medicine, is just using this thing constantly. So, like,
give us a sense of how this open evidence tool works,
what situations is it used for,
and what are its strengths and weaknesses?
Oh, that's a great question.
So how open evidence works,
all of these tools is a trade secret,
but it uses some sort of retrieval augmented generation
and an evidence retrieval tool.
And they have all these deals with the big medical journal.
So New England Journal of Medicine, JAMA.
And when you ask a clinical query,
it searches the evidence
and then tries to identify high-quality sources,
and then it always grounds what's coming back in the literature.
So you have gray hairs like me who kind of use open evidence the way that I would use a Google search
or one of the old tools.
So I use it as a souped up way to search the literature rapidly and often go to the primary
sources or I use it as a faster way to get a reference.
So a drug that I haven't dosed in a long time, open evidence pulls the drug monographs
from the FDA.
I can very quickly pull that up.
Younger doctors, I have noticed, and I don't.
don't know this empirically, but younger doctors are more likely to ask questions like,
what could be going on? Can you give me a second opinion? What is the next thing that I should do?
So ways that I don't traditionally use decision support or reference tools, but sort of a new way.
And of course, younger doctors also use it in the reference ways that I do.
Now, are they actually uploading patient data to this? Or are they just sort of describing
patients in generic and anonymized ways to get back some decision support?
My understanding is largely number two. I'm sure the company has a good sense on how many people.
I hope no one is copying protected health information and putting into it. Certainly what I've
observed from like my colleagues and my students, most people use it the way you would use like a
search tool when you have a question, which is, hey, I, like, I'm giving this person's suffraxone.
What's the right dose for an intra-abdominal infection? It's a sort of generic question.
that are being interpreted through the physician.
And are there any AI tools that are integrated with patient health records?
This has been an area where I think there's just been a lot of pushback of like,
I don't want my personal health data, my protected health data,
going into one of these cloud-based AI systems.
But are there hospital systems or medical systems that are bringing this stuff
directly into contact with patient data?
Oh, 100%. Yes.
Right now, most of the sort of in contact with patient data,
are less about physician-facing decisions and more about like billing.
There are companies that are like integrating with the electronic health record.
Those are not standard yet.
And then the EHR companies themselves.
So like Epic is obviously the biggest EHR vendor in the U.S.
They're doing a lot of work on building in native things.
So for example, at my health system, if I want to send a message to the patient,
the helpful AI at the top already has like, maybe you should say this.
It's usually not that helpful.
And I don't think I've used it once in my life.
But there are a lot of those things that are being experimented on actually built into the patients' health data.
I'm curious how doctors are feeling about all of this.
We saw a survey from the American Medical Association that found that more than 80% of physicians now report using AI professionally.
Is that physicians racing out and grabbing these tools and bringing them to the office because they're so helpful?
Or is this the classic case of a CEO saying, hey, you got to use AI or you're out of here?
So doctors are B.Y.O. AI. A lot of that AI use is AI scribes and a decision support software.
And I'll tell you, some people are just using straight up like chat GPT or Gemini or Claude for the decision support software.
So I think one of the reasons doctors thus far have been more positive about it than perhaps the overall population is they're largely tools that doctors are bringing themselves that they think make their lives better.
And at least not yet many things that are being imposed upon us.
Yeah. I've noticed that when,
When I and my friends go to the doctor now, we often are presenting our information to a chatbot first and then coming into the doctor with sort of a readout of what the chatbot has told us.
This is, of course, not a new phenomenon.
People have been doing this with like WebMD results for many years.
But is this something that you're seeing now is that many more patients are coming to you, having already discussed whatever's going on for them with a chatbot?
Yes.
This is the other big changes is that there's, you know, there's someone else in the exam room with me.
and often it's chat GPT.
They're talking, sometimes with my hospitalized in patients,
they're talking to chat GPT while I'm in the room with them.
And I think it's interesting because this is kind of a new competency for doctors.
We have to talk to our patients about AI.
And I have started to talk to my patients about what I think are like safe uses,
what are like safe uses while telling me and then things that they definitely shouldn't do.
My patients may talk to me more about it because I am like a doctor and an AI researcher,
but like a lot of my patients are using AI routinely.
Well, give us a flavor of what you're telling them because, you know, I'm definitely somebody who has looked up by symptoms before I've gone to the doctor. And I would say I found it enormously helpful. But I can also imagine, you know, more skeptical doctors being annoyed, you know, at a patient telling them, you know, what chat CBT says to do. So yeah, so here's my, I'll give you my spiel. This is, I give them a, what is it, a green light, yellow light, red light. So the green light uses are general health questions. So, so.
I have recently diagnosed with diabetes.
I really love seafood.
Can you help come up with a diabetic diet for me?
The green light uses are also preparing for clinic visits.
So I'm about to go see Dr. Rodman.
I want to make sure that I ask the right questions.
Here is the last note or the last thing he wrote.
Obviously strip out anything identifying.
Don't put your personal health information.
And help come up with a good questions to ask him.
And then other green light activities might be like wearable data.
I don't know how good they are wearable data,
but I will tell you if a patient is going to give me
like five years of their Apple Watch data,
they're probably going to get better from chat GPT
than for me pretending to look at five years of Apple Watch data
because it's a 20-minute visit.
The yellow light, Casey, I think, is a lot of the things that you're saying.
So I tell my patients it's okay to explore new symptoms.
It's even okay to seek out second opinions
when talking to a chat bot that can really help prepare you.
As long as you understand that is not a report,
for a doctor, and that is the first step to talking to a human being.
So LLMs are really powerful, and I mean, there is some evidence, of course, that, like,
when any human uses them, you don't always get laboratory-level performance.
Like, they can give you dangerous advice.
But diagnosis and, like, exploring symptoms, as long as you use it in a way to prep to see
your doctor can be very helpful.
The red light, what I tell them never to do, is, like, ask medical management decisions.
Like, don't say, my doctor said to do this.
Is this right?
Like, I have cancer.
God forbid you have cancer.
Is this the right chemotherapy option?
Like, a lot of those decisions are so nuanced,
seeking so much information.
Those are things that the models don't do well.
And they're so sycophantic, they can convince you
that they're saying the right thing even when they're wrong.
Yeah.
I'm curious, Adam, out here in San Francisco,
there are all these fitness people and health maxers,
people who love to track themselves using all manner
of devices and people are getting these full-body workups from companies like function health that
are, you know, sort of concierge medicine things, and they'll get, you know, 100 labs done,
and then they'll upload all that data into Claude or ChatGPT and just sort of treat it as a sort
of first-line medical professional in their lives. Do you think that is a good practice, or is that
just making people, you know, way too worried about things that maybe they don't need to be
worried about? Yeah. So that's making people way too worried about things they don't need
to worry about it. And this is chat GPT. LLMs in general. I mean, the dark side of talking to an
LLM about your symptoms is they are so sycophantic, they can drive you into like the cybercrondria
worry hole. The evidence is not there yet that the sort of large routine testing functional
medicine and putting it into an LLM does anything to improve health outcomes. Now, if your LLM is telling
you to work out and eat healthier, that's probably pretty good. Sleep. Yeah. What about the
integrations like ChatGPD Health, which lets you sort of convert your Apple Watch or Fitbit
data into something that ChatGPT can analyze. There's also a new version of ChatGPT for
clinical use called ChatGPT for clinicians. Are any of these integrations or projects more
promising in your view? Not yet, but I think it could be at some point. I mean, so ChatGPT for
health pulls in your data from the medical record and lets you chat with you chat.
your medical records. Now, reason number one for concern is privacy. That's obviously going to have
your entire medical history going to an AI company. It's also going to not be redacted by you in a way
to remove identifiable things. Reason number two, I think if we're talking about health record data,
it's really messy. They include tabular data. They include copy forwarded data that's been copied and
pasted. And they also, if you've ever read your health records, they include things that are wrong.
there's a lot of errors or misdocumented things in your health data.
And it turns out that just copying a bunch of information,
like, LLMs aren't magical.
You can't just copy your entire medical record in
and think that you're going to get good performance.
And I would never bet against the technology.
I think that we will get to the point that we have ways
to build representations of humans and understand their health.
But right now, there's like no advantage
to just dumping everything in an LLM,
which is what ChatGP2 for Health,
theoretically would allow you to do in a way that would allow you to better understand your health.
I'm curious if you saw this trial they're doing in Utah where you can use an AI agent to
autonomously renew prescriptions for almost 200 routine drugs.
Yes.
There's apparently some human review, but mostly this is automated.
Is that good idea, bad idea?
Well, so globally, no, we should not be having LLM's right prescriptions for people.
The trial in Utah in particular is a refurb.
So a doctor has already written prescription within the last 12 months.
And I guess the idea is that it saves the primary care doctor time from having to review and refill.
I'll tell you, if you talk to most doctors, yes, it is annoying to get refill requests.
No, that is not the thing that drives us crazy.
This is not like a use case that we're screaming for.
I think it's being done as a proof of concept of can this work in the real world.
This trial in and of itself is not dangerous.
Prescription refills, and I think there's no opiates.
There's no dangerous drugs in it, and a doctor has to have written the original one.
But even if it does work in this, that does not mean we should be having autonomous AI systems, right?
New prescriptions.
That is not safe, and it's not a good idea yet.
See, I think this is a case where, like, this is sort of rent-seeking behavior on the part of doctors or doctor organizations.
Like, when I have gotten refills for prescriptions, I meet with a doctor for, you know,
six to eight minutes. They say, how's it working? I say, great. They say, are you having any side
effects? I say, no. They say, okay, I'll write you a refill. And the whole process just seems
totally designed to, like, get me to pay up for another doctor visit and not give me any actual
good medical advice. So if I can play devil's advocate, like, do you think that there, that the sort
resistance to programs like this are motivated by just wanting to keep people coming to the doctor and
paying for those visits? So first, aren't most of your prescription refills just done as in you
call the pharmacy and they send an automated thing to your doctor and they click the yes
button and you never talk to them? No, for some, they make you actually do an office visit and
maybe they want you to take your blood pressure again or whatever. So I'll do the devil's advocate
back. Let's say I prescribe a fairly common antidepressant and they wanted to be
refilled. What I don't know is that this patient may be, the silly question you get in the clinic,
may be new lesions forming in your mouth. And it's an early ulcer. And if we don't pick it up within
24 to 48 hours, you may develop like Stephen Johnson syndrome, so potentially life-threatening
complication. And the reason there are certain types of drugs, including anti-hypertensives,
is that they can be high risk and we need follow-up. Now, is that everything? No. And definitely
there should be more things over the counter. I don't think that most doctors are sitting
around saying, I wish I had more medication follow-up visits.
And the reason some of these things exist is that there can be very dangerous symptoms.
Yeah, so keep going to the doctor, Kevin.
We can't have you developing those lesions.
It's too important to the show.
Let me ask you about another one.
This one actually seemed like just an unqualified good.
The Mayo Clinic announced this week RedMod, this AI system that identified subtle changes
in routine CT scans up to three years before a pancreatic cancer diagnosis.
And this was like many, many, many percentage points better in detecting pancreatic cancer than human beings.
So to me, this is like the sort of thing I keep waiting for AI to do.
And it seems like it's actually doing it.
And of course, that's very exciting with something like pancreatic cancer, which is notoriously difficult to detect and has like very low survival rates.
Yeah.
And this is so like completely out of the discourse of like autonomous AI agents, there's really exciting stuff happening.
So the Mayo Clinic, there have been some great studies on breast cancer detection.
a lot of these algorithms have gotten so good
that they're able to identify breast cancer
better than, I shouldn't say better than people,
but in a workflow that has a good detection rate
and then in picking up potentially cancerous polyps
when you get a colonoscopy,
so there's a lot of exciting
and really positive things that are coming.
And I mean, at the end of the day,
we'll need to see how RedMod works
in the real world and a trial,
but I'm really optimistic about that sort of technology.
Do you think that if AI
meaningfully extends life expectancy for people.
It will be because of new AI discovered drugs
or because of changes to routine health care
that are made more efficient or more accurate by AI.
Number two, I think that when you talk about AI drug discovery,
the part of the pipeline that's so difficult
is not necessarily the coming up with the new compounds.
It's running the clinical trials
and getting it through the regulatory process,
which can probably be sped up,
but not as much as the discovery.
you know, if we get this right, there's so many people in the U.S. who don't have access to a doctor,
who don't have access to very basic medications, who can't control their diabetes because of lack of access.
And I'm really hopeful that if we do this wisely, we can, you know, get people more access to care,
which, I'm doing my knock on wood, hopefully will improve health outcomes.
So like all of this, I think the potential benefits are like less exciting.
They're getting people, more people, the bread and butter.
and getting, you know, more people to have less heart attacks,
more people to have less strokes, more people to get their cancer screening,
and not necessarily like, oh, we cure aging
with some sort of new AI-discovered CRISPR technology.
Are you at all surprised, though,
that we haven't yet seen the first, like, AI-discovered wonder drug?
I mean, the biggest wonder drug of my career has been the GLP-1s,
which was, I started using it when I was a resident,
so we had it for a really long time,
and we had to, like, repurpose a drug for diabetes.
So, no, I'm not.
Like, medicine and science is just kind of messy, and it's, there are always those stories about, like, you know, we discover something amazing and makes, like, penicillin.
But even penicillin took, like, 20 years to get into human beings.
So, no, I think we will see AI discover drugs.
I think it's just the benefits from AI are going to be, like, the benefits from medicine.
It'll be a lot less exciting than people think, but still important.
So there's a lot of worry right now about sort of, sort of,
AI in schools, in education, some of the cognitive atrophy that people are worried about,
oh, if we start using AI to do all of our work, we're not going to have the basic skills.
Is that something you're worried about for, like, recent medical school graduates where
maybe they would have had to hold all this stuff in their brains a few years ago,
and now they can just ask a chatbot, and maybe that's going to erode some of their skills
as a physician?
Yes. So that is the biggest worry that I actually have about sort of the short to me.
medium term is de-skilling of the workforce. We have some evidence. There was a sort of scary study
last year from Poland on a trial where they gave doctors, not a language model, but a polyp-detecting
technology. And they looked at their ability to detect polyps, so potentially cancerous lesions
in the colon before using it, and then after using it for three months. And when not using it,
their ability to detect polyps dropped by six percentage points. So these are skilled doctors
using a technology
and they lose six absolute percentage points
of their ability to detect potentially cancer
in three months.
And then imagine that you're learning
to do it for the first time.
Will you ever gain those skills?
So like at Harvard Medical School,
like this is,
and medical schools, I think, everywhere,
this is our big worry,
which is how will this affect us
to train the new generation of doctors?
And it's like every other field.
Like you talk about debugging code.
In order to become a new doctor,
you go through all this training
because you need to make mistakes
and you need to have someone above you who knows what's going on so those mistakes won't hurt patients.
And that's just how education works.
And this threatens that.
I mean, it's interesting, though, because it's like, you know, it's probably true that because I had access to a graphing calculator,
like, if you took it away from me, I'd be worse at, like, plotting parabolas on a graph.
But the solution to that is that I just keep using the calculator, you know?
So, like, I'm not sure how big of a problem this really is.
I'll also say there's something, there's something deeply engaged.
in human society that middle-aged people complain about young people. So I think whatever we talk
about deskilling, we have to keep that in mind. Yeah, I mean, for what it's worth, like, I want my
doctors to be using AI models. I want them to be consulting the hive mind before they weigh in on
my specific condition. It doesn't threaten me as a patient to know that they are using open evidence or
something similar. But I'm guessing for a lot of people, that would seem strange. And maybe there are
some physicians who don't advertise how much they're using AI because their patients might think
less of them. Do you think that's happening? Oh, yeah. I think they're in certain situations,
in certain places, I bet there's social pressure to say that you're not using AI, that there's
some ego on the line. I don't see that, but again, I'm an AI researcher, so I don't think
anyone would say that to me. To me, it just seems weird to, like, to hold as your standard for
what makes a good doctor that they have memorized, like, a maximum amount of material.
That's basically what we're talking about.
It's sort of like, you know, the taxi drivers in London
that have to like learn every single street
and like hold them all in their heads.
It's very impressive, but I'm fine with them using the GPS.
Yeah, and I think it's less about,
so it is about memorizing.
It's more at this point, right,
with where AI is now,
it's more having sort of that knowledge
and we'll call it wisdom
to know when the system might be suggesting something wrong,
which is something that right now,
and this may change,
we get by seeing a lot of cases
and reflecting on them.
So right now, you're going to get the best performance if you have an experienced human
trained in the old-fashioned way with an AI system.
But I think your guy's point is at some point that might not matter.
The AI systems might just outperform all of us.
And then, yeah, I guess it's like just use the graphing calculator.
But we're not there yet.
Would the AI models be better if we were less protective of privacy for medical data?
I mean, that's such a loaded question.
So the first thing that I'm going to say before I answer that is patient privacy is very important.
and we should respect people's privacy
and their ownership over their data.
But yeah, so in short, like the reason they're not better
at certain things is that you need to get LLMs better,
you need to label and then train them on the sort of labeled health data.
And in the U.S., there are appropriately many restrictions
on how health data can be used.
I suspect that these companies like OpenAI
by having CHAPT for Health,
they will gain some more of their own data,
which they say they're not going to train on.
I trust that they're not going to train on it,
but they'll be able to use that data
to at least evaluate their models
and try to make them better.
I think they should train on it.
I mean, obviously, that'd be a huge illegal violation of privacy.
But it would also make the AI doctors better.
Yeah, much better.
And I think, you know, a lot of people
would be sort of willing to make that trade-off.
So I at least think there should be a little checkbox
when you go to the doctor that says, like,
I'm okay having my personal health data used to train AI models.
I for one would check it.
In exchange for like 30% off your giant medical bill.
You get a coupon.
You get a coupon, you know.
You get like your next OZempic shot is 20% off.
It's on the house.
Exactly.
Well, that's a good place to leave it.
Dr. Adam Romaman, thanks so much for coming back
and keep us posted on what is going on in medicine.
My pleasure, guys.
Thank you very much.
Thanks, Doc.
When we come back, we'll talk about talkie,
an LLM trained only on data from before 1930.
Well, Casey, usually on this show we are talking about the future, but today we are going to take a trip back to the past, specifically to the year 1930.
What was happening in the year 1930?
Oh, my goodness.
Well, of course, we were in the middle of the Great Depression.
My grandmother had recently turned 11 and was looking forward to getting her first store-bought dress in just a few years.
So this is a new language model, a vintage LLM, called.
Talky, and it is trained exclusively on data from before 1931. This is a research project
built by three guys, David Duveno, Nick Levine, and Alecord, the lead author of the GPT1
paper, former OpenAI researcher over there. And this is a fascinating project that has been
burning up my timeline this week, because this is an experiment in what happens if you only
feed a large language model data from before a certain cutoff.
Yeah, and obviously there are a lot of, you know, kind of character-based chatbots on the
internet that will give you the experience of talking to somebody from the past.
But what makes this project different is that they try to limit themselves to training data
from that time and before.
The hope was that it would avoid any kind of contamination from what came after.
And as you'll hear, they have some really interesting and potentially useful ideas about
what this kind of LLM might.
one day be used for. KZ, you spend any time playing around this model? I have. I tried to ask it the most
1930s question I could think of, which was, say, what's the big idea? What did it say? It said,
the big idea is to popularize. And I said, popularize what, fella? And it said, popularize a sport.
And I said, I'm gay. So that's kind of where we let that one drop. And it said, gay, you're happy?
Yes, exactly. It said, your heart must be light, sir.
Yeah, I love this experiment.
I love, like, weird niche language models.
One of my favorite language models of all time was Golden Gate Claude, which was the special version of Claude that was, like, pathologically obsessed with the Golden Gate Bridge.
I would put Taki in sort of that category of, like, an experimental research model that is maybe not all that useful on its own, but, like, helps illuminate something interesting and important about these language models.
and what happens when you train them in specific ways.
So today we wanted to talk with one of the creators of Taki.
We are going to bring in David Duvino.
David is an associate professor at the University of Toronto
who researches AGI governance and catastrophic risk mitigation,
and he is one of the co-creators of Taki.
And there's really Dovano better person to talk to about it.
That's true.
David Duvino, welcome to Hard For.
Thank you very much, Kevin.
So this project, Taki, is fascinating.
It is a vintage LLM.
Explain why you and Nick and Alec made this thing.
So this all started a year ago.
Was me and Nick were interested in forecasting?
Like specifically, can we teach machines how to forecast like five or ten years ahead of time?
Like, what is the big picture going to be?
Just because we have our own sort of like pet ideas about what the future is going to be.
We don't think people should take our word for it.
And we also don't think that people should trust machine forecasts unless they have a track record going back like decades and decades.
So the idea here is that if we could build a model who really only knew about the world up to a certain date,
we could ask it to forecast like five or ten years ahead of time.
Like ask it, what's the New York Times headline going to be five years for now?
Or is there going to be another great war or something?
And we can iterate and see like what kinds of things are predictable.
What does it take?
Like how far out can things be foreseen?
And then hopefully, eventually we'll have.
machines that have like a hundred-year track record of forecasting.
And then we can ask them, you know, in 2026, what do you think is going to happen, like,
you know, two or four, eight years from now?
And we'll have an idea of how much to trust those forecasts.
It's a fascinating idea, but it strikes me, it requires you to have, like, really good data.
So in this case, really good pre-1930s data, I'm going to guess that was harder to obtain than
just, you know, going out and, like, crawling Reddit or, you know, and everything else that
the frontier models have actually.
access to. So how did you face that challenge and where did you get this pre-1930s data?
Yeah. So I should mention, you know, there's a ton of groups doing a ton of awesome archival work here.
The first data set we got excited about was institutional books, which was Harvard Library, scanned like 1% of their entire collection.
And so they had tons of data from like 1800s, early 1900s.
Like there's a whole bunch of different groups doing tons of work. It would take a long time to enumerate this.
but, and also, like, OCR has just gotten a ton better,
just in the last, like, six months even.
And so there's always been lots of projects
to, like, automatically digitally scan this data,
but it just hasn't been very high quality until very recently.
And I assume that part of the reason you chose
the cutoff date of around 1930 is because that's when sort of works
become public domain, anything after that is copyrighted.
Are there any other reasons you chose that specific point
in time? No, that was entirely it, is that we wanted to make everything publicly available in
open source, and 1930s is just the sort of most recent date that has almost zero legal headaches
with releasing data or anything like that. So I've been fascinated by seeing like what people are
trying with this model. People are having it make predictions, but also asking it about its
favorite authors or its opinions of, you know, major historical figures. What have been the
experiments that have been most interesting to you? Yeah, the fun things that I've seen people do is
I mean, a lot of people like to ask like, what's 2026 going to be like? And the model has sort of
very philosophical answers about how like, well, we will have figured out that war is bad, will
have much have like a much more peaceful civilization or sometimes it says like it's the end times.
I mean, it's a very inconsistent model and it's not quite smart enough to really like, you know,
think things through in a systematic way. It kind of just gives you vibes. Now, that brings up a sort
interesting like wrinkle of like the kind of LLM this is because if you were to ask like a frontier
model today to predict the future, it would not only be trying to guess a statistically likely
sequence of words, right? It would also be like doing some reasoning. Taki is not doing that, right?
So that just sort of seems like by the way that it is built, we would expect it to be less good
at forecasting as as the models we have today. Yeah, absolutely. This is a very baby steps model,
the basic fine-tuning for reasoning
and the scaffolds, like the super
forecasting scaffolds that we know just improve
anyone's reasoning, like, you know, think of
the different distinct possibilities and assign them
each sub-p probabilities. The model's just
not really smart enough to follow these kind of detailed
multi-step instructions yet. So again, we just wanted
to release the first thing that we did, but
it's like a, there's a clear path to adding
all these refinements. So you do
plan to add reasoning as you go?
Oh, absolutely. Yeah. Okay. People have also
pointed out that the model
behind Taki seems to know about some
things that it probably shouldn't know about, like the rise of Hitler, the presidency of FDR,
things that didn't happen until after its data cut off. Is that proof that there's been some
kind of contamination of the training data with more recent data? Oh, there's definitely
contamination, and this is sort of like one of the ongoing things that we're going to have to
just keep revisiting again and again and refining. So we have a classifier that tries to look for
things that are anachronistic. And especially if you want to use this for forecasting or to evaluate
forecasting, it's really important that we really nail this issue. So we have all sorts of ideas
for canaries and things that we think the model should just never assign any likelihood to.
Think of, I don't know, Nagasaki and Hiroshima. Like before World War II, like those two towns
would just never show up in the same sentence ever almost, except for some weird coincidence.
So you can just tell whether there's been leakage about important events if the model just thinks
that there's any chance that you'll see those particular names together. So this, anyway,
Like, we've done, we've made a bunch of efforts to avoid leakage.
We know there's leakage right now.
So you shouldn't use it to evaluate your forecasting scaffold yet.
But like, how is it getting that data if it's only being fed, scanned, OCR books from archival sources?
Because archival sources have wrong dates in them all the time.
Or it's kind of unclear what the, like, date of a text is because there's like an updated edition.
Or sometimes there's like a preface that's been added later.
Or sometimes even just in the middle of the text, there'll be like,
someone inserted some future notice like, you know, note.
And like later on, like historians note, da-da-da-da.
And so it's just really hard to check all these little edits that people make,
and then they still maintain the original publication date on the metadata.
I see.
I asked Taki what it knew about me, and it said,
Kevin O'Hara, which is not my name, was born in Dublin in 1840,
and having been educated at the School of the Christian Brothers,
became a teacher in it. He afterwards adopted the profession of journalism and was for some years
connected with the staff of the nation newspaper. It also said I had written several popular songs,
including Molly Astor and the Irish immigrant. Now, obviously, most of that is wrong, but it did
connect me to journalism, which I found interesting and maybe like some other evidence of some data
contamination. But like, is this thing accessing the internet in some way? Or like, how would it have
known that I, or at least Kevin O'Hara, this character sort of connected to me in the model,
was a journalist.
You know, that's a great question.
I guess I'll say the training data was like 240 billion tokens.
And it's just, this is sort of like this vast ocean of stuff.
So, like, maybe there was a list of journalists that got put in somewhere that had your name in it.
I mean, I guess one thing about this model is it hallucinates like crazy.
And, you know, this was a huge problem with the chatbots that people were meant to use professionally.
and I think it's been addressed to a large extent in like frontier models,
but we made zero effort to address that in any of our post-training so far.
Kevin, would you sing a few bars of the Irish immigrant for us?
You know, I don't want to waste it with time here.
All right.
We'll save that for later.
Speaking of problematic content,
some people found that Taki gives racist responses to questions that are basically like,
you know, would you let a black professor teach your child?
I can see how that might be historically accurate,
it, but I'm curious if you anticipated it and how you feel about it.
Yeah, so it was also very clear to us that it had these kind of responses.
I mean, I guess I'm a professor myself, and my sort of first instinct is, like, let's let
people see this if they want to, and just don't surprise anyone and don't be flippant about it,
because, you know, it really can be upsetting to some people, and especially if we just, like,
treat it sort of insubstantly.
So the way we threaded the needle was we, we, we,
did zero like filtering of the data set for like problematic content.
We want to just like show what the actual sort of state of knowledge or state of thought was in the past.
It would defeat the purpose of the project if we put our thumb on the scale.
But for the public demo where you can talk to it talking, we just had a modern model with like modern sensibilities.
Just read every response.
And at the end, once it's generated, if it is deemed problematic, just like slap a warning and say like,
oh, this might have something upsetting.
just like click if you want to see it.
Right.
The description I loved, this came from Gavin Leach today,
was that Taki is creating beautiful prose by a terrible person,
which is consistent with some of my tests,
which is like, this thing actually does write quite well,
and actually, to my ear, like, much more literary
than some of the more recent models trained on more recent data.
But yeah, it is not, it is clearly the product
of its time or at least the time of its data.
Yeah, yeah.
And I mean, the prose is really cool because it's like very refreshing style.
And actually if you feed it to one of the AI detectors,
it usually says like 100% human, which is kind of funny.
But then I guess as you mentioned, like a terrible person,
I mean, right now it's kind of ends up being a sort of like average person.
And depending on like, it'll just randomly answer it with all sorts of different voices.
But that's one of the next things we're planning to work on is helping you talk to more specific people
or in specific sort of states of knowledge or times and places,
because that I think allows you to answer more coherent questions
than just talk to like the hive mind of 1930 or whatever.
Speaking of the hive mind, I saw another person ask Taki,
basically the person told it that it was from the future
and would tell Taki anything it wanted to know about the future.
And Taki's first question was,
how did universal peace come about?
Which was like the most heartbreaking thing I think I've ever read
from a large language model.
Like, what does that tell us about the time period or about the training data?
Well, I guess I'll say in general, futurism is a place where people don't actually often try to predict the future very hard.
And they more, they kind of project their values.
And if you ask someone, what do you think is going to happen?
They usually fill in something about what they hope is going to happen.
And I think that was also true 100 years ago.
So the trick is to get talking out of the, like, wishful thinking mode and actually like into brass tacks.
So like, what do you actually think is going to happen?
It's just funny because I think if like, you know,
know, if I could talk to an LLM from, you know, what, you know, 21, 26 today, I would just sort of be like,
are the humans still alive? Like, what's going on with the climate? Did the robots, you know,
how many people did the robots kill? It would just sort of be a very different set of questions
than Taki seemingly wanted to know. Now, are you going to point Taki at any big sort of scientific discoveries
and see whether it can make them? I mean, Demosis is.
Tabas at Google DeepMind has this sort of theory that AGI should be able to discover Einstein's
theory of relativity if you just give it all of the pre-existing scientific literature at the time.
Are you hoping to use this model or a descendant of this model for anything like that?
Absolutely, yeah.
So one of the, like, I think Nick especially is interested in this question of given a state of knowledge,
sort of how much would it take to, like, how far ahead can you just from pure reasoning
advanced your state of conceptual understanding.
And the classic examples are like, you know,
some of Einstein's discoveries,
which really didn't require experiments.
They just required putting the pieces together.
And there's actually another project called Machinist Miriablus,
who that took a training cut off of 1900
and tried to get it to see if they could rediscover special relativity.
I mean, the thing is that the models that those people did those experiments on
were like, I think, three billion parameters,
just not smart enough to do very much.
So he showed that you could hold his hand at a certain point,
we kind of get gesture in the right direction,
but to do the kind of like systematic reasoning and math
that Einstein had to do,
we probably need another, let's say, like 10X in parameters at least.
David, what are you building next?
Are you going to build a bigger version of Taki
and keep trying to get it to perform better?
Yeah, so there's a few things that we want to do.
So, like, obviously making the models bigger.
And so, you know, right now the model is still smaller than GPT3 was,
although bigger than GPD2.
So, you know, Alec kind of says that there's a bit of a phase change around like 100 billion,
150 billion parameters where the model starts to be smart enough to actually have a back-and-forth
conversation with.
Obviously, scaling up the data set and OCR efforts, and like right now everything is just mostly English
just because we can evaluate, you know, we can quality check the English text because we're all native English speakers,
but we want to obviously broaden this repertoire.
Like working on the filtering is obviously a big one.
And then obviously there's all, I mentioned all this like.
how to even evaluate the forecasting ability?
That's like another big question.
If you put the model in a robot, would that be a walkie-talkie?
You can ignore him.
Anyways, I'm going to go.
Well, David, fascinating experiment, and people can go try talkie for themselves.
It's at tocky-lm.com.
What's a good goodbye to a podcast guest?
Don't worry about what a podcast is.
Okay, well, as Taki says,
a pleasant journey to you, sir.
Thank you very much, sir.
Thank you, David.
Fork is produced by Whitney Jones and Rachel Cohn.
We're edited by Viren Pavich.
We're fact-checked by Caitlin Love.
Today's show was engineered by Daniel Ramirez.
Original music by Marian Lazzano, Diane Wong,
Roan Nemistow and Dan Powell.
Video production by Soyer Roque and Chris Schott.
You can watch this whole episode on YouTube at YouTube.com
slash Hartforke.
Special thanks to Paula Schumann, Puiwing, Tam, and Dahlia Hadad.
You can email us at HeartFork at NYTimes.com
with your most recent diagnosis.
We'll tell you if we think you should get a looked at.
