Hard Fork - Data Centers in Space + A.I. Policy on the Right + A Gemini History Mystery
Episode Date: November 14, 2025This week, we talk about Google’s new plan to build data centers in space. Then, we’re joined by Dean Ball, a former adviser at the White House Office of Science and Technology Policy. Ball worked... on the Trump administration’s A.I. Action Plan, and he shares his inside view on how those policies came together. Finally, Professor Mark Humphries joins us to talk about a strange Gemini model that offered mind-blowing results on a challenging research problem. Guests:Dean Ball, senior fellow at the Foundation for American Innovation and former White House senior policy adviser for artificial intelligence and emerging technologyMark Humphries, professor of history at Wilfrid Laurier UniversityAdditional Reading: Towards a Future Space-Based, Highly Scalable A.I. Infrastructure System DesignWhat It's Like to Work at the White House Has Google Quietly Solved Two of AI’s Oldest Problems? We want to hear from you. Email us at hardfork@nytimes.com. Find “Hard Fork” on YouTube and TikTok. Subscribe today at nytimes.com/podcasts or on Apple Podcasts and Spotify. You can also subscribe via your favorite podcast app here https://www.nytimes.com/activate-access/audio?source=podcatcher. For more podcasts and narrated articles, download The New York Times app at nytimes.com/app.
Transcript
Discussion (0)
Can you see, what's going on?
Oh my gosh.
So the other day, I'm walking down Market Street, and for context, this is like, you know,
maybe like one of the main thorough affairs in San Francisco.
And over the past year, this is a little bit obnoxious, but I would say four or five times
someone has recognized me from the podcast and stopped me and want to take a picture.
It always makes my day, hard work listeners are the best.
It had happened to me just the previous week.
Well, then this weekend, I'm coming home from the gym.
And you know how you are when you're coming home from the gym.
Your face is flush.
Yeah, you're sweaty.
You're sweaty.
Your hair's, you know, all over the place.
And this very sweet young woman comes up to me and asks for a picture.
And, of course, I'm thinking, I kind of look, you know, gross right now, but anything for a hard fork listener, right?
And she's there with a guy who I assume is, you know, her boyfriend or her husband.
And so, you know, I put on a show and I'm introducing, you know, hey, and, you know, what's your name and all that?
She hands me her phone and they go and they stand up against the street with her back's turn, you know, so they can get kind of San Francisco.
in the background, and that's when I realize
these people have no idea who I am.
They are just tourists, and they want a picture
of themselves in San Francisco.
I'm Kevin Roos, a tech columnist at the New York Times.
I'm Casey Newton from Platformer.
And this is Hard Fork.
This week, Google's crazy new plan to build
data centers in space.
Is this the final frontier of the AI bubble?
Then, former Trump White House policy advisor,
Dean Ball tells us what Republicans really
think about AI. And finally, it's a history mystery. Professor Mark Humphreys is here to talk about how
an unidentified new Gemini model offer mind-blowing results on a challenging research problem. It was
about Canada. It was not about Canada. It was basically about Canada. It was about sugar. It was
about the sugar trade in Canada.
Good fair.
We are going to start by talking about space.
Finally, the final frontier, some call it.
Yes, because I have been looking into this story that I have become obsessed with,
which is that we are going to build freaking data centers and put them in space.
I'm very excited to talk to you about this.
I would say I have sort of been skimming the headlines,
so I have a lot of questions for you about this.
But I think whenever we can start an episode in space, that is a great place to start
because I don't know if you looked around lately, but who wants to be on planet Earth,
I like an alternative. I'll say that much.
Yes. So this has been a thing that has been quietly percolating in the tech industry.
Obviously, we have this giant data center buildout going on here on Earth.
Every company wants to build these giant data centers, fill them with these GPUs, use them to train their AI models and do things like that.
As you may have noticed, it is not easy to build data centers here on Earth.
No, I've tried.
I got nowhere.
I mean, I felt like I was building IKEA furniture.
It was like, you want me to do what?
And you need land, you need permits, you need energy to power the data center.
You need to do all of this relatively quickly, and people sometimes get mad when you try to put up a data center where they live.
Also, we are facing an energy crunch for these data centers.
There is literally like not enough capacity on our terrestrial.
energy grid to power everything. That may get worse as people demand more and more AI and the growth
continues exponentially. Yes. So a couple companies, including just recently Google, have now announced
that they are exploring a data center in space. Which sounds like a joke when you said, like any,
building anything in space seems so impractical, so expensive, so doomed to failure, that it truly does just sound like a
joke. But what you're saying to me right now, Kevin, is that there is a legitimate,
serious plan to try to do this. Yes. I also thought this was like some kind of crazy
science fiction moonshot thing. And it is like an experimental thing. No one is doing this like
today. But Google has put out a paper on what it calls Project Suncatcher. Yes,
suncatcher, which sounds like a lost Led Zeppelin single, but is somehow a project to build
Data Centers in Space.
Yes.
So they're calling this a moonshot.
They're saying, you know, this might not happen for several more years, but this is an active
area of research for them.
There are a couple other companies that have been doing this.
Jeff Bezos, Eric Schmidt, other sort of big tech folks are really interested in this
idea.
And I think we should talk about it today just to kind of give people a sense of like what
the future may hold if we continue to demand all of this power and all of these data
centers to run these giant AI models.
Yes, I think it is so worth talking about because among other things, it indicates that we are at the stage of this bubble where people have come to feel like we cannot provide enough electricity for the future we want to build on the planet that we live.
We actually have to get off the planet to realize our ambitions.
So if nothing else, that just tells you how ambitious these companies are getting and the crazy big swings that they're about to take.
Totally.
So where should we begin?
Well, let's talk about Project Suncatcher first.
What exactly is Google proposing to do?
And what did it say about it last week?
So this was a blog post and a paper that came out last week.
They are calling this a future space-based, highly scalable AI infrastructure system design.
And basically, they have started doing some testing to figure out if a space-based data center would actually be possible.
And the problem that they're trying to solve here is twofold.
One, as we mentioned, it's very hard to build stuff here on Earth.
You need all the permits and approvals and energy.
The second is, like, the sun is a really freaking good source of energy, right?
It emits something like 100 trillion times as much energy as the entire output of humanity.
But building solar panels on Earth has some issues, mainly the sun sets for half the day, so you can only get power for half the day.
Which has long been one of people's primary criticisms of the time.
Yes.
But if you put the solar panels and the data centers into low Earth orbit,
and you put them on something called the dawn, dusk orbit path,
which I did not just look up this week.
I definitely knew what that was from my high school astronomy class.
You can effectively give them nearly constant sunlight,
and the solar panels can be much more productive,
up to eight times as productive as solar panels here on Earth.
So let me ask you this, because when you say data center, I picture one of these, like, giant anonymous, you know, office complexes that's like the size of six, you know, football fields that they're, you know, building all over the heartland right now.
I assume that they are not going to build something like that in space.
No, these would be, if you look at some of the mockups that some of these companies, there's another company called StarCloud that's sort of like a startup that's got some funding from Nvidia.
And if you look at the mock-up that they have made, it kind of looks like a giant bird,
but, like, the wings are these, like, very thin solar panels, these sort of, like, arrays of
solar panels, and the kind of, the center of it is kind of this, these clusters of computers,
essentially, and it's just kind of out there orbiting in space, and the wings are kind of catching
all of the sun, and they're feeding that energy into the computers at the center of,
the cluster. Got it. So we're in one of these giant terrifying bird-like structures that are sort of
swarming over the earth in this future. And they're getting so much energy from the sun and it's so
efficient. And that is sort of driving all of the compute that's happening inside the computers.
How does whatever is happening inside the giant terrifying bird get back to us down here on Earth in a
timely fashion? That's a great question. And I asked this to a couple people I talked to over the
past week or so who've been working on this stuff. And what they told me is this is actually not
that much different from something like Starlink, right? You're sending data from a satellite or a
series of satellites back to Earth. It's not that far away, right? It's not like these are light
years away. It's like it might take, you know, a couple more milliseconds than you would take to
transmit something here on the Earth. And that is actually something that we know how to do.
Got it. Okay, Kevin. So last week, Google puts out a blog post about this. Give us a sense of
where they are in this experiment.
So I would say they feel like they are pretty early in this process.
There are still some technical barriers to overcome,
and we can talk about those.
But they have started actually running tests to figure out things like,
well, if we send our TPUs, our AI training chips out into space,
like will they just sort of fall apart because of all the radiation out there?
And they actually did an experiment that they described in this paper
where they took just a normal like TPU, like the kind that they would put in their
data centers here on Earth, and they like took it to a lab and they hit it with a proton beam
that was supposed to like simulate a very intense kind of radiation that these chips would
experience if they were floating out in space. And they found that their newer TPUs actually
withstood radiation much better than they thought. So these things can apparently handle
radiation well beyond what's expected of them in a five-year mission. Now, if you watch the
Fantastic Four First Steps earlier this year, you know that cosmic radiation,
is what transformed the Richard's family and Ben Grimm
into the Fantastic Ford.
Has Google addressed that at all
about sort of any of those concerns?
They did not address that, to my knowledge.
They did address some other potential hurdles.
One of them is, like, if these chips glitch out or break,
how do you fix them if they're in space?
And I asked a couple people who have worked on similar projects,
and they basically said, yeah,
we got to figure out how to, like, get robots up there
to, like, fix the data centers.
Got it. So they'll focus on using robots for that. I guess that makes sense.
Now, am I right that Google is actually planning to do some kind of like test launch within the next couple of years on this?
Yeah, they are planning to test this in 2027 by launching two prototype satellites in partnership with Planet,
which is a company that sort of sends up these little tiny satellites into space for mapping and things like that.
And that is their plan.
There are also other companies, including StarCloud, which is also planning to send up some prototypes
pretty soon.
So they're moving forward with testing on this.
I will say, I think this is probably not going to happen in any real way for at least a
couple of years, in part because things are still very expensive to send up into space.
It is not right now economically feasible to send up a whole bunch of chips and a whole bunch
of satellites up into space. It costs many times more than what you would need to build a
comparable data center here on Earth. Yeah, and people here on Earth are saying that building the
data centers that we're doing here on Earth are not economically feasible, right? So I can't
imagine how much more out of control the costs are going to be once you leave orbit. One thing I
thought was interesting in the Google blog post was that the company tried to play Suncatcher
in the lineage of self-driving cars,
so what is now Waymo, and quantum computing,
which hasn't quite become a mainstream technology yet,
but has made a lot of strides.
You know, just within the past year,
we did an episode on it not all that long ago.
And they're sort of saying, like,
Suncatcher is kind of one of those,
where we are willing to work on this for 8, 10, 12, 15 years
to make it into a mainstream technology.
And so I took that as Google saying,
like, hey, this is not just like some,
crazy little experiment that a couple engineers
are working on in their spare time. It seems like
they're serious about this. I think they're serious about
this, and I think they are looking
out to a future
five, 10, 15 years away
where kind of
the demand for AI and
AI-related tasks is just
essentially infinite, right? It's like
this is not something that
10% of people are using every day.
This is something that 100%
of people are using
constantly, that there are like sort of entire companies or sectors of the economy that have
been sort of fully turned over to AI. And maybe that happens and maybe it doesn't. But if it does
happen, we're going to need a lot of energy and a lot of data centers and we may run out of
land and power here on Earth. Now, something that I did not realize until after I had read about
Suncatcher is just how many other companies are looking at doing the same thing. Can you kind of
give me a high level overview of like who else is playing here? And does it seem like anyone else is
further along than Google is right now? Yeah. So as I mentioned, there's this company StarCloud,
which is a Y Combinator startup that got some funding from Nvidia. They are sort of the main
ones here doing this. There's also a company called Axiom Space that is doing this. And we think
that there are some Chinese companies, or at least one Chinese effort to do a space-based
data center, although they've been a little bit vague about the details there. And then the
information had an article about some comments that Eric Schmidt and Jeff Bezos have made
suggesting that maybe they are also interested in or looking at doing something like this.
Well, you know, Jeff Bezos just put Lauren Sanchez in the space. So you have to wonder if that was
kind of a first step towards something in this vein.
Yes.
You know, one thing I think that is interesting about this approach, Kevin, is that, as you
know, we've seen an increasing amount of resistance from people in sort of local communities
to having data centers put in their towns or near their towns.
They're worried about how it's going to affect the cost of energy for them, right?
They're worried about water usage or the environmental impact.
And so I think that, you know, if this sort of thing comes to pass, we'll
have gone from like, you know, just like the nimbies saying not in my backyard to this new group of people that I'm calling the noms that are saying not on my planet, you know, and they want all the data centers just built up in the sky. So do you think noms are going to become a sort of major political force? I do. Although I also think that eventually people may start to start to not want them in space either. But it's going to be harder for them to protest. You got to get in a rocket, go up there into low earth orbit. It's very inconvenient.
Now, why wouldn't people want them in space?
Well, there are various people who think that this is going to create a lot of, like, space debris and things like that, that would eventually be bad.
I talked to some folks who, you know, work on this stuff, and they were like, they don't think that's really going to be a big deal.
There's all kinds of stuff up in space now.
We generally don't pay much attention to it.
But I can see this sort of sounding to people like Elon Musk, you know, proposing to build colonies on Mars or something.
Like, it's just like, it's like too futuristic, it's too sci-fi, and it sounds like these very, you know, rich companies and individuals trying to kind of flee from their problems here on Earth by, like, sending stuff into space.
Here's what I would say.
I would love to be, like, living at a time when one of the top ten concerns I had in my life was space debris.
If I ever get there, Kevin, I will be in heaven.
Heaven!
Well, you'll be in low Earth orbit, technically.
Exactly.
Now, I have a question for you.
Yeah, yeah.
Would you go to space?
Yes, absolutely.
Would you go to space to fix a data center?
I mean, what is the salary for that job?
Very high.
I mean, there's probably a certain price for which I would do it.
But here's the thing, you know, I'm not handy around the house.
Yeah.
It's like, if I, you know, if chat GPT doesn't know what to do, I'm calling the handyman.
Yeah.
Okay. I will just say that I think we should make a,
an offer to Google, which is
if you guys get this project Sun Catcher
up into Lower Earth orbit, we
will do a podcast episode where we go up there
and cut the ribbon. You're just dying
to be exposed to massive levels of
solar radiation.
You know, I just think it'd be
fun.
When we come back, the ball is in our
court. Dean Ball talks about how
we crafted the AI action plan.
Well, Casey, recently we've been talking about some state-level AI regulations that have been passed and signed into law.
But today we're going to have a discussion about now.
national AI policy.
Yeah, I think that the states have been acting because the federal government has not really
passed any legislation related to AI just yet.
And that's left us with a lot of questions around how the administration has been thinking
about AI.
It's been a little confusing.
I think especially, you know, in this administration, it has not been particularly clear
to me what President Trump and his allies believe about things like whether or
we are headed towards some kind of an AGI moment or how the federal government should try to
protect against some of the risks of very powerful AI systems.
So the conversation that we're going to have today, I think will help us answer some of these
questions and just kind of get a better sense of like what is happening in Washington,
especially on the right, when it comes to AI and AI policy.
Yeah.
So earlier this year, Dean Ball spent several months working as the White House's senior policy
advisor for artificial intelligence and emerging technology. He was brought into the White House
in order to lead the drafting of the White House's AI action plan. And in that role in the White
House, Dean not only got to see how the AI policy sausage was made at the highest levels of
government, he actually got to make the sausage himself. He was sort of responsible for taking
all these different ideas from the various parts of government and putting them together into
a document that would represent the administration's sort of official view on.
AI. Yeah. And while he was there, Dean also got a good sense of who are the various factions on the
right when it comes to AI policy. What do they believe? What are the competing incentives?
Who has whose ear? And I think if you want to understand the likely path forward for AI regulation
over the next few years, that's a really important part of the conversation. Yeah. So Dean left
the White House in August after the AI action plan was released. And since then, he's because,
a senior fellow at the Foundation for American Innovation and the author of hyperdimensional
newsletter about AI and policy. And because we're going to be spending a lot of time in
this segment talking about AI, let's do our disclosures. I work for the New York Times,
we're just suing open AI and Microsoft over alleged copyright violation. And my boyfriend
works at Anthropic. Let's bring them in.
Dean Ball, welcome to Hard Fork. Thank you both for having me. It's so good to be here.
So how did you end up at the White House earlier this year working on AI policy?
What was your background before that?
I was a think tanker.
A lot of it was not tech policy.
A lot of what I did was state and local policy.
But I was always very interested in tech.
And basically, when the AI policy conversation really took off sort of early 2023,
I made the decision to start writing about AI.
Basically, as a part-time gig, just like purely on the side.
wasn't being paid for it or anything.
And then eventually I decided I really liked it, and I was finding my voice, and I was hired
by the Mercatus Center at George Mason University to go spend some time there, spent about a year
there, and then was recruited to the White House on the basis of primarily my writing on
substack, and my substack is called hyperdimensional.
It's where I talk about, you know, AI stuff.
The substack to White House pipeline, I feel like that is, you are not the only person who has
posted their way into a job.
in the federal government?
You can post your way to the federal government.
It's really true.
And probably, I'm probably like a big chunk of it was probably my posts on X, really,
which is maybe even more scary.
But, yeah.
So, okay, you get this call.
You go to the White House.
What did you find there with respect to AI policy?
Was there like a coherent single view of how AI should be,
governed and regulated?
I would say there are coherent intuitions,
but the field is so nascent,
and there haven't been a lot of fights
where dividing lines have really firmed up yet.
I think, by the way, this is true on the left as well.
I don't think that those intuitions have formed yet
into like a lot of different sort of very specific policy positions.
I don't think they've concretized.
yet is really what I'm saying. I think though, you know, there's there's a combination of
excitement and some worry and some confusion, probably equal parts, which is, you know, in a macro
sense, that's probably roughly where I am too, actually, and that sounds about right to me.
You say there were some coherent intuitions about AI in the administration. What were those
intuitions? I think coherent intuition number one is AI is the most important.
technological, economic, scientific opportunity that this country, and probably the world at
large, has seen in decades, and quite possibly ever.
I think basically everyone shares the assessment.
This is going to be extremely powerful, and it's going to be really important.
And second intuition that directly follows is there are going to be some risks associated
with this that are sort of familiar to us and things that are cognizable under existing
sort of policy frameworks and others which might be more alien and might be like
risks that we don't really even have concepts for as clearly yet and then you know maybe
the third intuition is regardless of those risks it feels like AI is going to play a very big
role in the future of like American global leadership yeah that's really helpful and
and kind of helps me get a sense of like the lay of the land when you arrived.
I'm wondering if you can help me understand the kind of intra-right factions when it comes to
AI, because I've, I think I've identified like at least two different views of AI that I've
heard coming from prominent Republicans.
And maybe you could call them like the David Sachs view and the Steve Bannon view.
David Sachs, the president's AI czar, is constantly talking on.
online and on his podcast about, you know, these AI doomers who we think are sort of ridiculous
and are overhyping the risks of AI and trying to sort of, you know, get their way on policy,
calling them woke, implying that they're sort of trumping up these fears of, no pun intended,
of job loss and things like that to sort of get their way when it comes to policy.
Then there's Steve Bannon, who has been, you know, out there talking about the existential risks from AI.
And you and I were both at this curve conference, actually all three of us were there a few weeks ago, where one of Steve Bannon's sort of guys was there and gave this very fascinating talk about how he thought, like, he was sort of in league with the so-called Dumers who believe that this could all go very badly very soon.
Are there more views on the right than those two, are those sort of the primary camps?
No, I think that there's a whole spectrum.
I can't speak for either David or Steve, of course.
would put them on like roughly polar opposites in terms of how, you know, about how conservatives talk
about this issue. But I think there's a whole spectrum in between. So first of all, you've got
national security people. You're national security people who don't actually know a ton about,
and this is, again, both sides here. You know, they're just, they think of this as a strategic
technology that's important for U.S. competition with China and other things. And also maybe they
think there's some national security risks, but they're not really thinking about, like,
the domestic policy. They're not really thinking about regulation. They're not thinking EA versus
Dumer. So that would be one. I think also, you know, related to the sort of Bannon viewpoint,
but maybe, you know, more toward the middle would be like people that are worried about
kids safety primarily. There's a lot of conservatives who would distance themselves from the
AI Dumer view, but who would also distance themselves from the pure accelerator.
vision, and they would use the lessons we've had with social media as an example.
So sort of that kid's safety viewpoint, for these people, very often the issues of things
like LLM psychosis, of course, teen suicidality with chat pots being another very salient
issue for this group, for everyone, I hope.
But yeah, there are others in between, and I guess I would put myself somewhere in kind of
the middle and a weird, weird fusion.
Where does industry fit into that spectrum?
Like, my sense from the outside is that industry groups and lobbyists have had a lot of
success in this administration in getting what they want.
Where are they in those conversations?
I think it really depends on incentives.
People in policy conversations very often will refer to, like, industry as being this
kind of monolithic, coherent entity.
It's, of course, not.
And there's different people that have different incentives.
So, you know, if you're a U.S. hyperscaler, you don't hate the export controls, you know.
You don't want more competition for the same chips that you're trying to buy.
Meaning like Microsoft or Google or an Amazon.
Yes, Microsoft, Google, Amazon Web Services, et cetera.
You don't hate that because, like, A, you don't want Chinese firms competing for your chips.
But even if it's not the same chips you're competing over, you don't want to be implicitly competing over space at TSM,
to make the chips. So, you know, hyperscalers, you know, they will definitely have, like, nuanced
positions on export controls, but by and large, like, their incentives are not to hate them,
and they largely don't. Um, Frontier Labs, I mean, they want to make money selling tokens to
people. So they want access to chips. Uh, but, you know, I think there's some people who believe,
and it's from a political theory perspective, it's not wrong to believe that, like, ultimately
they want to create moats. And I think there's a lot of ways you can make moats. It seems to me like
the main way they're trying to make moats right now is through infrastructure, that they've
basically all come to the Anthropic today and asked a $50 billion commitment to build their own
data centers. Google obviously does this. Open AI does this through Stargate. Meta does this.
XAI does this. Everyone does this. Everyone's building infrastructure. And the basic view is like,
well, the models maybe are not your moat per se, like the parameters of the model are not
your moat, but perhaps the infrastructure is.
And so, you know, these are all competing interests and no one's making illegitimate arguments here.
Everyone's operating from incentives.
And, of course, the job of government is to sort of solve for the equilibrium.
Dean, is there a MAGA view of AGI?
Not yet.
No, not really.
I don't know that there's in any political persuasion view of AGI.
I think MAGA might actually be the closest to having one.
And I think it's at the moment, maybe the persuasion.
at least from what I see online is, like, maybe it's sort of more dumery.
I believe we saw a bipartisan bill introduced over the past week that would require
reports of job losses due to automation, which suggests that there is some increasing attention
to that likelihood.
Yeah, well, I mean, so there's this big question, you know, in the AI field, like places
like the curve and places like a Lighthaven, there are these gatherings of various sort of
doyens of the AI community, and they get together.
And the main question that people talk about is like, when are the pitchforks going to be out for this technology and what is going to cause the pitchforks to come out?
And I have come to the conclusion that rather than it being a singular issue, it's going to be this kind of miasma of issues.
It's going to be like, you know, it's sloppification.
It's not safe for kids.
It's driving up your electricity prices.
It's using all the water.
It's taking your job.
And it's taking your job, and also it's going to kill everyone, and also, by the way, it's fake.
It'll be all those things and kind of this weird sort of vishy swath.
The aspect of the AI action plan that I find the most annoying is the attention on the ideology of the chatbots and the suggestion, you know, that they should be able to, you know, respond in some ways, but not in other ways.
Could you kind of illuminate the discussions that we're being had and what the administration actually wants out of these models?
Yeah.
So I think the main point here, first of all, like the most important thing, you're talking about the woke AI executive order.
Yeah.
What it is, so it's traditionally phrased.
This is an executive order that deals with federal procurement policy.
In other words, this is not an executive order.
It's not a regulation on the versions of AI models that a company like Anthropic or Open AI or any other company ships to consumers or private businesses.
This is purely about the versions of their models that they ship to the government.
And the government is saying in this case, we do not want to procure models which have top-down ideological biases.
engineered into them, we would like our government employees to have access to models which
are, you know, I think objective is a really hard word. Obviously, we've been, like, debating
about, like, what is truth for, you know, since there was language, right? So I don't think
we're going to resolve that. I have a feeling the General Services Administration guidelines
will not resolve that issue. You know, I think it's folly to even try. And I think the executive
order doesn't try, you know, the executive order steers clear of doing so. The executive order says,
Instead, you just, we don't want you as the developer imposing some sort of worldview on top of the model.
Well, good luck with that, I guess.
Well, I want to ask one follow up on that because my sense is that, you know, the Trump administration and Republicans in Congress have been very upset with how the Biden administration sort of jawboned, how they applied pressure to social media companies to take down, you know,
misinformation or what they considered misinformation about the COVID vaccines or things like that.
That was seen as like very inappropriate. In fact, they're like ongoing investigations of the
contacts between the Biden White House and the social media companies over this issue.
Yes. And then we turn around and we see this like woke AI executive order where it's like,
I understand the subtle point you're making about, you know, this is not regulating the models that
the companies are releasing to the public. It's just the ones that they're selling to the government.
but like we all know that there's there's one set of models right and they get they get built
and they get sold to various customers and I think you know it's reasonable to see that and think
okay this is the Trump administration doing exactly what it got so mad at the Biden administration
for doing which is to contact the tech companies and tell them hey this is how your product
should be working this is the kind of things that should be allowing and not allowing and I don't
No, no. Does that seem at all to you, hypocritical?
Well, so, look, I think that there is an inherent tension here, and this is a tension that
has existed on the right, and it's particularly existed sort of post-Trump 45, post-President
Trump's first term. There is this argument that exists of should we stick to our principles
that the government shouldn't be doing this kind of job owning, or should we accept that the
government has this power, and now we need to throw it back?
at the left, right?
And I can tell you that I personally
have always definitively been on one side
of that argument, which is the form review.
We should stick to principles.
We should not fight, we should not.
No job boning from anyone.
Yeah, you shouldn't do that.
I mean, like, you know, you shouldn't do that.
At the same time,
I think the government totally has a right to say,
and again, what we're talking about here,
like I wouldn't think of this as like a model training thing.
I would think if this is the sort of thing
that can be relatively,
like trivially easily changed by the developer, right?
So models that are sold to the government
already have compliance burdens
that are significantly higher
than this executive order, right?
They have to comply with the Freedom of Information Act.
They have to comply with the Presidential Records Act
if they're sold to the White House.
There's all sorts of data stewardship laws
that are way more difficult
than anything in the Woke AI executive order.
The WOKI executive order basically says,
like, you need to disclose
in the procurement process to the agency,
from whom you're procuring, you need to disclose, like, what the system prompt is.
You can change a system prompt for a specific customer.
It's not that hard.
And I would only point out that, like, I will just say it here right now, that, like, if you did
try to use federal law to compel a developer to change the way they train the models that
they serve to the public, that is unambiguously unconstitutional.
It is a violation of the First Amendment.
You are violating that company's speech rights, and you are violating the American citizen speech rights who might use that model.
So it would be quite dire and grave for the government to do that, and I am confident that the woke AI executive order was not intended to do that.
So, Dean, I really enjoy your newsletter.
I've been reading it since before you joined the government.
I continue to read it today.
And one point of view that you advocate for with great frequency.
is that most, if not all, AI regulation should be done at the federal level.
And you spend a lot of very valuable time looking into how states are attempting to regulate
AI in ways that I think you believe are mostly bad.
Could you kind of give us a high-level overview of your interest in this subject and what
you see states doing that concerns you so much?
Yeah, so I come from a state and local policy background, I should say.
And so, like, my view is that a lot of the real governance in this country happens at the state and local level.
And I mostly, now that I live in D.C., I mostly say, thank God that that's the case.
That being said, there are some things that inherently implicate interstate commerce.
And I think that models which are trained to be served to the entire world, which cost a billion dollars to train, that the standards by which those models are trained and evaluated and measured,
you know, I think those have to be federal standards because you can't have competing standards.
Now, maybe we don't end up having competing standards.
Maybe what happens is the biggest state regulates, and that happens all the time in America.
There's many, many technologies where the state of California or the state of New York or somewhere like that, Texas sometimes, has an implicitly federal effect, one state doing lawmaking.
I think that's a failure mode.
I think it's an issue of our, a structural issue of our constitution that the founders couldn't really possibly have contemplated because like the notion of economies of scale didn't quite exist for them.
And so I think it's a really, really difficult issue of Supreme Court jurisprudence.
Right now it's the case that California by default is the central regulator of AI in America.
Thus far, I think they've done a better job than I would have guessed, but still not a great job.
So I was, you know, broadly supportive of their flagship AI bill from this year, which was called SB 53.
It is a transparency bill that applies only to the largest developers of AI models.
And to me, it seems rather reasonable overall.
Let me bring it back to maybe some more like contemporary AI concerns, though, which is, you know, earlier when you were describing some of the kind of, you know, landscape in Washington and who's concerned about what you mentioned, there's this group of Republicans who are very concerned about.
chatbot psychosis, child safety, teen suicidality.
Those are all harms that are present today that seem to be encouraged on some level
by products that are out on the market.
And we have a Congress that is very loathe to pass really any regulation at all when
it comes to the tech industry, whether that's for ideological reasons or just logistically,
it's very difficult to get Republicans and Democrats to agree.
Or the government's shut down half the time.
That's also been increasingly an issue.
And so in such a world, I can very much understand the point of view of a state lawmaker who says, well, I don't want the kids in my state to kill themselves.
Like, we're going to do something about this right now. And we're not as dysfunctional as the federal government. So we're going to get in there. And we're going to try to do something. So how do you view that dynamic? And is your desire truly that the states would just say, hey, we're not going to get involved. And that's on Congress.
No. So, I mean, look, I understand the incentives of the state lawmakers, like, for sure. I think Congress needs to act. Like, my, my view is more proactive. My view is like, Congress needs to deal with this. This is a problem that Congress needs to deal with. I don't blame the state lawmakers. I blame, sometimes I do. Sometimes I blame them for poor statute drafting. There's no excuse for that, right? Your job. Like, and I say this sometimes to legislators. And they're like, well, we'll let the courts figure that out. And I say, no, you took an oath to the Constitution too, not just the judges. But in the
the general case of like, I want to protect kids in my state. No, of course I don't blame them
for that. Yeah. I want to zoom out a little bit and ask a question about AI and polarization.
It feels to me right now like AI is kind of in this weird, confusing, pre-polarized state.
Like there's this sort of machine that sort of, when an issue gets important enough or salient enough
people, it kind of gets run through the polarization machine. And like it comes out the other side
and, like, Republicans take one position
and Democrats take another position.
Do you think something similar is going to happen with AI
where, like, it will become very predictable
which view you hold on AI
based on which party you vote for?
I think what's more likely is that over time,
it splinters, and there's, like, different things
that people talk about.
So there's going to be data centers,
and there's going to be, you know, China competition
that'll be an issue,
and there'll be, like, the software side,
regulation, there'll be the kids issues. Just like today, you know, we don't talk about computer
policy or internet policy. We talk about internet policy used to be a thing. In the 90s, internet
policy was a thing. But now it's like social media, you know, privacy, whatever else. I think
it'll splinter in that way. Will those issues themselves be polarized? Yeah, I mean, in some
ways they will be. Yeah. I do hope, though, that there's certain parts of, and this is a very
important part of, you know, the action plan in my view, too. The action plan, like, not
every single aspect of an issue has to be polarized. There are legitimate tail risk type
events, national security issues that I think it is the obligation of the federal government
to deal with in a mature and responsible way. I've heard Ezra Klein before, I love this turn
a phrase of his. Who I've never, I've never heard of him. Yeah, we're not familiar with his work.
Yeah.
as a i've heard him describe government as a grand enterprise in risk management i think that's true
in a fundamental sense i think that's very true and um so you there are certain things that we just
do need to deal with and the action plan tries to make some incremental progress on some of those
things and of course there's a lot of things we need to do to embrace the technology and let it grow
and all that too and i think that's an important part as well but that's less controversial to say
as a republican um i think the maybe more controversial thing right now to say is like yeah there
are like legitimate risks. And I hope those things can be bipartisan. The dealing with those risks can be
bipartisan because really like if we can't deal with catastrophic tail risk, then we do not have a
legitimate government. Like the whole point of government is to deal with this issue. And we should
just, as Michael Dell said about Apple in the 90s, we should throw the thing out and return the money to
the shareholders if like if we can't manage these things. I really do believe that. So let's talk about
that point specifically. When I look at AI policy in America today, I mostly see the big
frontier labs getting just about everything they want, right? Like, it seems like there is a high
degree of alignment between the labs and the government. And when it comes to, like, safety
restrictions, for example, I don't see a lot that is holding them back from, you know,
building their next two or three frontier models.
So there are components of the AI Action Plan that are meant to address some of those
catastrophic risks that you mentioned.
Tell us how you envision that actually working.
Where is the moment where the industry stops getting everything that it wants?
Well, I would say there's so much you can say here.
I think the first thing is that many of the people who work at the frontier labs, I can't
speak for the labs, of course, but knowing a lot of them personally, including up to
very senior levels, I can say that they have an earnest desire to deal with these problems.
And they invest real resources as companies.
And part of the reason they do that is because they have incentives, because their companies
would be bankrupt if they, e.g., caused a pandemic.
Right?
And the other thing is that, like, a lot of these problems are super tractable.
Like, we don't have to act as though these things are, like, the hardest.
problems we've ever dealt with to me as someone with experience in public policy and by the way
this is the posture of like people that I met in government who are 30 year veterans of thinking
about tail risks to them you bring up like AI bio risk or AI cyber risk and they're like yeah
sounds like a serious risk okay there's a hurricane that's tracking toward Florida let me go deal with
that right like um these things come across your desk every day when you're in government
these are eminently tractable problems in the near term with current technology and
technology that I think we're going to have in the near future, without spending a ton of money,
there's a lot of traction you can get on them that doesn't involve really in any meaningful way
slowing down AI development. I want to push back that there's this trade-off between sort of
mitigating tail risk and slowing down AI development. Now, will that always be the case? No. At some
point, there will be trade-offs. We'll have to make those trade-offs and they'll be hard, and it's like
hard for me to know where I'll come down on that because it'll depend on the particulars. But right now,
we have this great opportunity of like, oh, we can accelerate AI development and we can also
have better biosecurity, which, by the way, was a problem before ChatGPT existed. There was a whole
pandemic about it. So, like, yeah. Sometimes I talk to people who work on AI policy or just,
or just, you know, work on AI and think about policy. And they'll say things like, you know,
I don't think we're going to get any meaningful AI regulation until there's a catastrophe.
for you. Do you, Dean, think that it will take something like that to really catalyze significant
movement on AI policy in Congress?
Possibly. I mean, like, I can't say that, like, that certainly a catastrophe is plausible and could
catalyze movement in Congress, for sure. I think there are other ways to achieve this. I really do.
Like, I think you can make incremental advancements in the absence of a catastrophe.
Now, it depends on, like, a lot of people in the AI safety community will say this,
or people that are at labs who care about AI safety also.
They will say this.
That's, like, a very anthropic type of position.
And I don't say that as a pejorative, by the way.
To be totally transparent.
Like, I've heard this from people at lots of different labs where they're sort of like, yeah,
I don't really think, like, we're capable.
of, and it's not so much a knock on like this particular Congress or anything. It's just like,
I don't think the government is capable of regulating things in advance. I am okay with government
being in a mostly reactive posture, particularly with respect to things that aren't tail risk.
Tail risks are the one exception because you, you know, those things can be very, very damaging
and so you want to do some stuff in advance to mitigate that. But when it comes to like most other harms
from AI, I'm comfortable with government just really reacting to realized harms in areas where
it's like, okay, well, it's a realized harm that we've seen. We think that's going to continue
happening. It doesn't appear to be resolved adequately by the existing system of common
law liability that allows people harmed to sue the people who harmed them. And it can be
meaningfully addressed through a targeted law. And if all those conditions are satisfied, then we
should totally pass that law. I think kid safety is in this category. Yeah. Yeah. Well, Dean,
thanks so much for coming. Really fascinating conversation. And people should check out your
writing. Your website is hyperdimensional. It was a real pleasure, guys. Thank you. Thank you.
Thanks, Dean. When we come back, we'll have more to say about the Canadian fur trade than we've
ever said before. It was not the Canadian fur trade. It was the upstate New York sugar trade.
They're related in ways I don't understand.
Well, Scooby gang, it's time to get in the old mystery machine, because today we've got a mystery.
That's right, gum shoes.
grab your notebook and your magnifying glass because there are a few clues and we're about to crack
the case wide open. And this one is a history mystery. It involves an experiment that a historian ran using
an AI model. And we're going to talk about it all with the historian in just a second. But Casey,
to set the scene here a little bit, there are a lot of rumors going around right now about this new Google Gemini
3 model. There really are. Gemini 2 came out almost exactly a year ago, came out last December,
and while Google has updated it throughout the year, we have been hearing an increasing number
of whispers this fall about Gemini 3 and rumors that it really is pretty great. So Alex Heath
reported a few weeks back that he expected Gemini 3 to come out in December. And one thing that
happens in the run-up to their release of new models is that companies quietly test them,
And that brings us to our story today.
Yes.
So Mark Humphreys is a history professor at Wilfred Laurier University in Ontario, Canada.
He does research involving a lot of old documents and trying to decipher the handwriting on these documents.
And he is also kind of an AI early adopter.
He's got a substack called Generative History, where he's been writing about his experiments
using AI to solve some of his research problems.
And recently he had a post that really caught our attention.
called, has Google quietly solved two of AI's oldest problems in which he explained a really
fascinating experiment that he ran using one of these kind of test models inside Google's
AI studio, which is a Google product where you can kind of experiment with different models.
And he says that the responses that he got back from this mystery model made the hair on the back
of his neck stand up.
Like this was so astounding to him, not just because they were very good, but because
because they seemed like a different kind of capability
than ones he had seen in any other AI model.
Yeah, and so the mystery is what model was Mark using,
but I think the bigger story is
what does it mean that this historian
was as impressed as he was
with this very unusual thing
that he found a large language model doing?
Yes, and we should say, like,
it is very hard to determine exactly which model
anyone is sort of being shown at any given time,
the way these pre-release tests go.
Companies will show 1% of users one model
and another 1% of users a different model
and kind of ask them to compare the two.
And they give them weird code names.
They don't tell you what you're using.
Exactly.
So there's still some uncertainty around this.
This may have just kind of been a one-off.
We will obviously need to see
what Gemini 3 actually does when it comes out.
But for now, I think this is a very interesting story
because it points to the way that these AI models
are starting to do things that surprise even expert.
it's in their fields. Yes. And so for those reasons, it's time to bring in Mark Humphreys and talk about
what he found. Kevin, you know the difference between an American and a Canadian historian?
What's that? Canadian historians process data while American historians process data.
Is that true? Yeah, that's true.
Well, let's talk to Mark, and he can pronounce it however he wants.
Hell yeah, brother.
Mark Humphreys, welcome to Hard Fork.
Thanks for having me.
Where are we catching you today?
Are you up in Canada?
What's going on up there?
I am. I'm in Waterloo, Ontario, in Canada, in my office at the University of our
Wolford Lower University.
So Waterloo, so you must just be surrounded by AI computer scientists at all times.
There are a lot of startups and a lot of AI researchers and a lot of computer companies in
Waterloo, yes.
Home of the Blackberry.
That's right.
That's right.
Yes.
Rimpark.
So before we get into the specifics of your most recent brush with this new mystery AI model,
can you just tell us how you've been using AI in your history research over the last year or so?
Sure.
So my research partner and I, Leanne Letty, who's lab this all comes out of as well,
have been working on trying to develop ways of processing huge amounts of data,
mostly handwritten, related to the fur trade.
And that involves a couple of things.
It involves trying to recognize the handwriting accurately,
but it also involves trying to basically generate metadata for all of,
you know, tens of thousands of records to try and understand what's in those records
and make connections between them.
So we're kind of operating a task that are kind of at the,
just at the threshold of what AI models are capable of doing.
So it's been kind of interesting to watch over the last couple of years,
the models get better and become capable of doing some of these things
and then finding out new limitations as we go along.
Yeah. And tell us a little bit about the kind of work that you do in general. I know you're really focused on using older documents in your work. What kind of stories are you trying to put together?
Yeah. So, you know, I've always been really interested in stories of ordinary people. So in the fur trade, when you're trying to understand, you know, what happened to ordinary people in the 18th and the 19th centuries, the problem is many of them were literate, didn't write. And although they kind of appear in a lot of documents,
that are generated in the kind of course of living.
You know, these are marriage, death records, account books, stuff like that.
It's a lot of detective work.
It's a lot of trying to piece together stories from fragmented documents,
what somebody bought in one place, a contract they signed somewhere else,
a baptismal record somewhere else.
And so a lot of this is trying to do that,
and that's what Dr. Letty and I've been trying to do with our graduate students,
is to try and piece together what these stories about ordinary people can tell us
in the fur trade and in the western part of North America.
from kind of the period of about 1760 through until the early 19th century.
You know, it's interesting, Kevin, because every time I go to a Starbucks and they try to give me a receipt,
I think I don't need any paperwork about what just happened here, right?
I'm just going to take my mocha and get out of here.
But what Mark is saying is that that document could be of huge value to a future historian
in understanding our lives.
Exactly.
Yes, they will want to know.
Let's get into it.
So, Mark, tell us about this experience that you had with Gemini, the AI model that you were
trying to use for this transcription, basically taking this very old document about the fur trade
and plugging it in and saying, transcribe this. Tell me what this says. Yeah, so, you know,
I think to understand why this is kind of a significant or looks like it could be a fairly
significant development. It's important to understand kind of where we've come from in the last
two years on this, right? So when GPT4 first came out in 2023, it could kind of sort of read
handwritten documents. It would be mostly errors, but you could kind of see that
it was beginning to be able to do this.
And it's been really easy for kind of companies and systems to get up to about 90% accuracy.
And then everything above 90% has been pretty difficult.
And the problem is that that last 10% is the most important part, right?
So that if you're interested in people's names, you're interested in amounts of money,
you're interested in where they were, you've got to get that stuff right in order to make it useful.
And up till about, I guess, you know, when Gemini 2.5 Pro came out,
back last spring. We were kind of still in that era. And Gemini 2.5 Pro got up to about
95% accuracy. And that's really good. So what I was interested in is when we began to see reports
kind of on X that there were new models being tested by Google in AI Studio, which is their
kind of playground app. I was just curious, how much better with this gag?
So, okay, you are hearing these rumors that there's this new mystery model inside
the AI studio that Google tests new models in before they're released. What do you do?
Yeah, so we have a Dr. Lettie and I have a corpus of 50 different documents that we've been
using to benchmark kind of how these models improve over time. They're all documents that we are
pretty sure not in the training data because we've either taken them ourselves or they've been
kind of from sources that are not typically online. And you can't be 100% sure, but it seems to be
the case. So I started to put a few of those documents in, and for your listeners who are not
maybe aware, the way that the testing of these types of models often works, is you kind of
have to put in the document dozens of times before you get the hit on the model you're hoping
to test because it kind of randomly pops up. So it's not an easy thing to do. I managed to test
about five of our 50 examples, about 1,000 words, and the results were impressive, to say the
least in the sense that the error rate again declined by about 50% from where it had been
with Gemini 2.5 Pro. And it got to about a 1% word error rate, which means every one in
100 words, obviously, you're getting wrong. But that can include capitalization, errors,
punctuation, stuff like that. So that in itself is really significant. No models come close to
that. Human experts who do transcription for a living offer about a 1% error rate. So that itself is
fairly important.
And your sense of having used this new experimental model, did that just come from
you're inputting dozens and dozens of queries?
And every once in a while, you would just get a result that was radically better than
the others.
And you thought, aha, I must be getting the new one.
Or were there any other signs about what Google was showing you?
Well, it's A-B testing.
So what that means is normally in AI Studio, you put in a query and you get a response.
And when you get the A-B test, you get two responses.
And it asked you to rate which ones better, right?
And the labs do this in order to get feedback on, you know,
is the model actually better on specific types of tasks than other ones, right?
So you might have to do that 20 or 30 times until you get one of those two responses.
And then the differences were pretty notable.
So you said the overall error rate fell by about 50%.
But that was not actually what impressed.
you the most about this new model. What impressed you the most? Yeah, so first of all, that
was, you know, impressive. And then I was curious, okay, if it's gotten to this point, how is it going
to do on tabular data? And as historians, one of the things you work with, you know, to go back
to your Starbucks example, are receipts and ledgers that come from, you know, merchants in the
past. And a lot of that's fairly boring. But if you want to know where somebody is, where they
bought their coffee one morning and you want to trace that person's movements, you can use these
types of documents to do that. You can see what they bought and all of those types of things.
The thing is, to this point, models have been pretty bad on tabular data. It's often very,
it's kept kind of like a cash register receipt system is kept, so it's kind of just on the fly.
Nobody's expecting people to necessarily read it down the road. So it's difficult to interpret just
by looking at it. It's also sometimes quickly written, so it's even worse handwriting than people
are used to. And because it's historical documents, in this case, I'm dealing with records from
18th century New York State, upstate New York, in Albany. And those records are written in pounds,
shillings, and pence. So that's the old, it's a different base than we're used to using,
in which you have basically a different form of currency measurement. And so when I dropped in a page,
just kind of at random from this ledger, I was just curious to see what I get back. And suddenly,
it not only came back in a near perfect transcription,
which itself was kind of remarkable given how difficult it is to make sense of what's actually
on the page, but as I started to go through it, I was looking for errors,
as trying to find errors.
And I began to realize that some of the things that I was seeing on there that looked like errors
were actually clarifications, and they required the model to do some really interesting things.
Give us an example.
Sure.
So in the actual ledger document, right, what we're dealing with is a series of kind of entries
that are made in a daybook.
So this is as people come into a store, they're buying things, and it's being recorded just like on a cash register sheet.
And in the one case that I was in particular looking at here, what it basically says in one of the entries is Samuel Slit came in on the 27th of March, and it says to one loaf of sugar at 1445 at 140191.
And what that means when you actually break it all out is that this guy named Samuel Slit came in.
into the store. He bought one loaf of sugar. If you're not aware in the 18th century, sugar comes in
hard conical shapes and they break off pieces and they sell it to you. And it says one four or five
sold that one shilling four pence per pound. And then the total is zero pounds, 19 shillings,
and one pence. And this is the old kind of notation, right? And what I saw in the actual model's
response, though, was that it had figured out that in fact it was one loaf of sugar measured out,
measured out at 14 pounds of sugar five ounces sold at one shilling four pence and then for the
total right and what's insignificant about that is that in order to figure out that what was written
on the page just random number one four five to figure out that that was 14 pounds of five ounces
the model had to be able to work backwards from a different currency system with a different base
the thing that makes that important is that models shouldn't be able to do that right that these
models are basically the way they're trained is in pattern recognition. What they're trying
to do is they're trying to predict the next token. And so the first problem here is that predicting
numbers is actually very difficult for models to do, right, in the sense that the model has
no idea whether Samuel Slit is buying 14 pounds, five ounces or 13 pounds six ounces, right? I mean,
that's a random number effectively. It's not probabilistic. The other problem is that although there
would be, you know, a lot of material in the training data that would be relate to this kind
of old currency system, the reality is there's not that much of it in terms of the actual
percentage of the material is there, because there's so little of this that's out there in terms
of the, you know, the overall sum total of all the records that exist. And so when we're thinking
about it, the model is having to do some interesting things there. What it looks like to me is
it's a form of symbolic reasoning. I have to know in my head that I'm dealing with different
units of measurement, which don't have a common kind of a base pair to multiply or divide by.
And then I have to kind of abstractly realize that these units of measurement do, in fact,
they are comparable as long as we do some conversions, and we have to then move them around
in our heads to figure out. This is something that I had to think about it for a second and
realize, in fact, the model had done something that was mathematically correct and unexpected.
So what are the implications for you in your work of a model being able to do this kind of abstract reasoning?
Yeah. And so as an historian, what it means is that assuming that this replicates once we start to see the actual model come out,
you're going to be able to trust the models to do a lot of stuff that historians would normally need to do.
Right? So it's one thing to transcribe a document. It would be another to say, here's a ledger, go through and add up all the sugar that was bought and sold in this ledger. And right now, you can't trust a model to do anything like that, right? You can't trust it to necessarily recognize sugar. You can come up with quantities, do that type of math. If we're getting to a point where models can begin to do that, you can begin to get them to do tasks that would take humans a very long time.
Right. It sort of sounds like the equivalent of the moment where like AI coding tools.
tools went from being a useful assistant for a person who's a professional programmer to
like actually being able to go out and program things just on their own with very minimal
instruction. It's like that for history, right? Yeah, and I think that's a really good example.
But I think that the interesting thing about history here is that I think it's a very typical
kind of knowledge work kind of area, right? In the sense that a lot of the stuff we're doing is
pretty esoteric and your listeners will probably be wondering, you know, who's really interested in
how much sugar people bought in Albany in the 18th century.
Well, Casey is, but he's a special case.
Yeah.
Yeah, that's fair.
I'm really interested in this Samuel slit and why he needed 14 pounds of sugar.
Like, take it easy, Sam.
It's true.
Well, he's a merchant.
He also wants to go and sell it to other people, right?
Oh, he's a dealer.
There we go.
He is, a sugar dealer.
Yeah.
But the interesting thing about this, I think, right, is that the stuff we do as historians with
these historical records, this is what all knowledge work people do, right?
is that you take information and you synthesize it,
you take it from one format, you put it into another,
you realize the implications of the things that you're reading,
and you draw conclusions and analysis based on that, right?
And it can be 18th century sugar,
but it can very easily be any other kind of widget that a knowledge worker uses.
So what I'm seeing turning on here for historians
is highly likely to start turning on in other areas as well,
that up to this point in the models,
you know, we've been getting the sense that they're starting to get good enough where you can feel
like, yeah, I think I can trust the outputs on this, but you're getting to the point where it just,
it works. And as somebody who uses coding assistance all the time now, that is, it's a very similar
situation where you used to have to cut and paste back and forth and it would, you know,
it would never run the first time. You'd have to run it, you know, three or four times,
pace the errors back and forth and eventually it would work. And now you can just kind of hit the
button and it almost always works, right? And that's what we're going to see here with knowledge work.
So I want to zero in on what makes this so interesting. So we don't know at this moment that this is Gemini 3, but I think Kevin and I feel like it's highly likely to be Gemini 3, right? And we also don't know a lot about how, if it is Gemini 3, exactly how it was trained. But I think we can assume that it was trained in a way that its predecessors were, which was in part by just feeding it lots more data, lots more compute, right? Just sort of following the scaling laws. And there's been so much debate over the past year about are we seeing diminishing returns, right?
Have we sort of figured out the limits of what we can get out of these scaling laws?
The story that you're telling us, Mark, is a suggestion that, no, we have not gotten everything that there is to be gotten out of this increased scaling.
And in fact, we should expect to see continued emerging properties from this ongoing scaling.
And you've just given us an example of it right there.
So that's why I think this is so fascinating.
Yeah, and I was fascinated by this experiment.
And I wanted to see if I could actually get to the bottom of what happened here.
So I asked some folks who would be in a position to know, like, hey, there's this, you know, history professor in Canada.
He thinks he, like, stumbled onto this, like, unreleased Gemini 3 AB test, and it was really good.
And they said, lose my number.
No, they were very tight-lipped.
They did not want to talk about it.
They are keeping things very secretive over there.
But I was able to confirm that Google does test new models in AI Studio.
before they sort of appear elsewhere.
And so I think if I were a betting man,
it's a pretty good bet that what you experienced
was, in fact, an unreleased model,
probably Gemini 3.
So, Kevin, I have not been in the AI studio myself recently
to see if I could try this model.
Have you made any efforts to try to access
whatever this model is?
Yes.
So I use AI Studio.
People don't know this,
But, like, Google has, like, you know, 800 AI products, right now they're like, you know, a billion ways to use Gemini.
And the most effective way, the best way to use Gemini is inside this product that basically no one except, you know, developers and nerds like us uses, which is called the Google AI Studio.
And if you go in there, I don't know, for whatever reason, Mark, do you find this too?
But, like, the model, like, the version of Gemini in AI Studio is better than the one, like, on the web.
I don't know why.
But this is something I'm consistently able to get AI Studio to do things like transcribing long interviews that the regular old Gemini won't do.
So anyway, I was in there this morning actually doing some research for our segment about SunKetcher, this like Google project about AI putting AI stuff in space.
And I was trying to have it summarized this research paper and give me some ideas in comparisons to what other companies are doing.
And I got this A-B test, this like, you know, choose between these two answers.
And I am looking at it right now.
It says, which response do you prefer?
It has these two side-by-side things.
And they basically both look pretty good.
I think the problem I'm identifying, Mark, is that unlike you, I am not smart enough to come up with, like, problems that are challenging enough where the difference between one pretty good model and a very good model is readily apparent.
So maybe you can help me with that.
Well, I mean, here's an idea. I know Mark really focuses on the 17 and the 1800s in the fur trade. What about the 1500s? I bet you can make a debt.
Yeah. Well, I will, I'll look into that. All right. Well, totally fascinating experience. And I can't wait to hear more about what you're doing with AI and history. This is a really interesting mystery that I hope we've shed some light on. Thank you, Mark.
Thank you very much for having me.
I'm going to be able to be.
Hard Fork is produced by Rachel Cohn and Whitney Jones.
We're edited by Jen Poyan.
Today's show was fact-checked by Will Pysholt and was engineered by Chris Wood.
Original music by Alicia Boutoup, Marion Lazzano, and Dan Powell.
Media production by Soya Roque, Pat Gunther, Jake Nickle, and Chris Schott.
You can watch this whole episode on YouTube at YouTube.com slash Hard Fork.
Special thanks to Paula Schumann, Pueuing Tam, Dahlia Hedad, and Jeffrey Miranda.
You can email us at Hartfork at NYTimes.com on what else you think we should build in space.
Thank you.
