Hard Fork - Data Centers in Space + A.I. Policy on the Right + A Gemini History Mystery

Episode Date: November 14, 2025

This week, we talk about Google’s new plan to build data centers in space. Then, we’re joined by Dean Ball, a former adviser at the White House Office of Science and Technology Policy. Ball worked... on the Trump administration’s A.I. Action Plan, and he shares his inside view on how those policies came together. Finally, Professor Mark Humphries joins us to talk about a strange Gemini model that offered mind-blowing results on a challenging research problem. Guests:Dean Ball, senior fellow at the Foundation for American Innovation and former White House senior policy adviser for artificial intelligence and emerging technologyMark Humphries, professor of history at Wilfrid Laurier UniversityAdditional Reading: Towards a Future Space-Based, Highly Scalable A.I. Infrastructure System DesignWhat It's Like to Work at the White House Has Google Quietly Solved Two of AI’s Oldest Problems? We want to hear from you. Email us at hardfork@nytimes.com. Find “Hard Fork” on YouTube and TikTok. Subscribe today at nytimes.com/podcasts or on Apple Podcasts and Spotify. You can also subscribe via your favorite podcast app here https://www.nytimes.com/activate-access/audio?source=podcatcher. For more podcasts and narrated articles, download The New York Times app at nytimes.com/app.

Transcript
Discussion (0)
Starting point is 00:00:00 Can you see, what's going on? Oh my gosh. So the other day, I'm walking down Market Street, and for context, this is like, you know, maybe like one of the main thorough affairs in San Francisco. And over the past year, this is a little bit obnoxious, but I would say four or five times someone has recognized me from the podcast and stopped me and want to take a picture. It always makes my day, hard work listeners are the best. It had happened to me just the previous week.
Starting point is 00:00:25 Well, then this weekend, I'm coming home from the gym. And you know how you are when you're coming home from the gym. Your face is flush. Yeah, you're sweaty. You're sweaty. Your hair's, you know, all over the place. And this very sweet young woman comes up to me and asks for a picture. And, of course, I'm thinking, I kind of look, you know, gross right now, but anything for a hard fork listener, right?
Starting point is 00:00:44 And she's there with a guy who I assume is, you know, her boyfriend or her husband. And so, you know, I put on a show and I'm introducing, you know, hey, and, you know, what's your name and all that? She hands me her phone and they go and they stand up against the street with her back's turn, you know, so they can get kind of San Francisco. in the background, and that's when I realize these people have no idea who I am. They are just tourists, and they want a picture of themselves in San Francisco. I'm Kevin Roos, a tech columnist at the New York Times.
Starting point is 00:01:16 I'm Casey Newton from Platformer. And this is Hard Fork. This week, Google's crazy new plan to build data centers in space. Is this the final frontier of the AI bubble? Then, former Trump White House policy advisor, Dean Ball tells us what Republicans really think about AI. And finally, it's a history mystery. Professor Mark Humphreys is here to talk about how
Starting point is 00:01:35 an unidentified new Gemini model offer mind-blowing results on a challenging research problem. It was about Canada. It was not about Canada. It was basically about Canada. It was about sugar. It was about the sugar trade in Canada. Good fair. We are going to start by talking about space. Finally, the final frontier, some call it. Yes, because I have been looking into this story that I have become obsessed with, which is that we are going to build freaking data centers and put them in space.
Starting point is 00:02:15 I'm very excited to talk to you about this. I would say I have sort of been skimming the headlines, so I have a lot of questions for you about this. But I think whenever we can start an episode in space, that is a great place to start because I don't know if you looked around lately, but who wants to be on planet Earth, I like an alternative. I'll say that much. Yes. So this has been a thing that has been quietly percolating in the tech industry. Obviously, we have this giant data center buildout going on here on Earth.
Starting point is 00:02:46 Every company wants to build these giant data centers, fill them with these GPUs, use them to train their AI models and do things like that. As you may have noticed, it is not easy to build data centers here on Earth. No, I've tried. I got nowhere. I mean, I felt like I was building IKEA furniture. It was like, you want me to do what? And you need land, you need permits, you need energy to power the data center. You need to do all of this relatively quickly, and people sometimes get mad when you try to put up a data center where they live.
Starting point is 00:03:21 Also, we are facing an energy crunch for these data centers. There is literally like not enough capacity on our terrestrial. energy grid to power everything. That may get worse as people demand more and more AI and the growth continues exponentially. Yes. So a couple companies, including just recently Google, have now announced that they are exploring a data center in space. Which sounds like a joke when you said, like any, building anything in space seems so impractical, so expensive, so doomed to failure, that it truly does just sound like a joke. But what you're saying to me right now, Kevin, is that there is a legitimate, serious plan to try to do this. Yes. I also thought this was like some kind of crazy
Starting point is 00:04:08 science fiction moonshot thing. And it is like an experimental thing. No one is doing this like today. But Google has put out a paper on what it calls Project Suncatcher. Yes, suncatcher, which sounds like a lost Led Zeppelin single, but is somehow a project to build Data Centers in Space. Yes. So they're calling this a moonshot. They're saying, you know, this might not happen for several more years, but this is an active area of research for them.
Starting point is 00:04:36 There are a couple other companies that have been doing this. Jeff Bezos, Eric Schmidt, other sort of big tech folks are really interested in this idea. And I think we should talk about it today just to kind of give people a sense of like what the future may hold if we continue to demand all of this power and all of these data centers to run these giant AI models. Yes, I think it is so worth talking about because among other things, it indicates that we are at the stage of this bubble where people have come to feel like we cannot provide enough electricity for the future we want to build on the planet that we live. We actually have to get off the planet to realize our ambitions.
Starting point is 00:05:16 So if nothing else, that just tells you how ambitious these companies are getting and the crazy big swings that they're about to take. Totally. So where should we begin? Well, let's talk about Project Suncatcher first. What exactly is Google proposing to do? And what did it say about it last week? So this was a blog post and a paper that came out last week. They are calling this a future space-based, highly scalable AI infrastructure system design.
Starting point is 00:05:48 And basically, they have started doing some testing to figure out if a space-based data center would actually be possible. And the problem that they're trying to solve here is twofold. One, as we mentioned, it's very hard to build stuff here on Earth. You need all the permits and approvals and energy. The second is, like, the sun is a really freaking good source of energy, right? It emits something like 100 trillion times as much energy as the entire output of humanity. But building solar panels on Earth has some issues, mainly the sun sets for half the day, so you can only get power for half the day. Which has long been one of people's primary criticisms of the time.
Starting point is 00:06:29 Yes. But if you put the solar panels and the data centers into low Earth orbit, and you put them on something called the dawn, dusk orbit path, which I did not just look up this week. I definitely knew what that was from my high school astronomy class. You can effectively give them nearly constant sunlight, and the solar panels can be much more productive, up to eight times as productive as solar panels here on Earth.
Starting point is 00:06:55 So let me ask you this, because when you say data center, I picture one of these, like, giant anonymous, you know, office complexes that's like the size of six, you know, football fields that they're, you know, building all over the heartland right now. I assume that they are not going to build something like that in space. No, these would be, if you look at some of the mockups that some of these companies, there's another company called StarCloud that's sort of like a startup that's got some funding from Nvidia. And if you look at the mock-up that they have made, it kind of looks like a giant bird, but, like, the wings are these, like, very thin solar panels, these sort of, like, arrays of solar panels, and the kind of, the center of it is kind of this, these clusters of computers, essentially, and it's just kind of out there orbiting in space, and the wings are kind of catching all of the sun, and they're feeding that energy into the computers at the center of,
Starting point is 00:07:55 the cluster. Got it. So we're in one of these giant terrifying bird-like structures that are sort of swarming over the earth in this future. And they're getting so much energy from the sun and it's so efficient. And that is sort of driving all of the compute that's happening inside the computers. How does whatever is happening inside the giant terrifying bird get back to us down here on Earth in a timely fashion? That's a great question. And I asked this to a couple people I talked to over the past week or so who've been working on this stuff. And what they told me is this is actually not that much different from something like Starlink, right? You're sending data from a satellite or a series of satellites back to Earth. It's not that far away, right? It's not like these are light
Starting point is 00:08:37 years away. It's like it might take, you know, a couple more milliseconds than you would take to transmit something here on the Earth. And that is actually something that we know how to do. Got it. Okay, Kevin. So last week, Google puts out a blog post about this. Give us a sense of where they are in this experiment. So I would say they feel like they are pretty early in this process. There are still some technical barriers to overcome, and we can talk about those. But they have started actually running tests to figure out things like,
Starting point is 00:09:07 well, if we send our TPUs, our AI training chips out into space, like will they just sort of fall apart because of all the radiation out there? And they actually did an experiment that they described in this paper where they took just a normal like TPU, like the kind that they would put in their data centers here on Earth, and they like took it to a lab and they hit it with a proton beam that was supposed to like simulate a very intense kind of radiation that these chips would experience if they were floating out in space. And they found that their newer TPUs actually withstood radiation much better than they thought. So these things can apparently handle
Starting point is 00:09:43 radiation well beyond what's expected of them in a five-year mission. Now, if you watch the Fantastic Four First Steps earlier this year, you know that cosmic radiation, is what transformed the Richard's family and Ben Grimm into the Fantastic Ford. Has Google addressed that at all about sort of any of those concerns? They did not address that, to my knowledge. They did address some other potential hurdles.
Starting point is 00:10:08 One of them is, like, if these chips glitch out or break, how do you fix them if they're in space? And I asked a couple people who have worked on similar projects, and they basically said, yeah, we got to figure out how to, like, get robots up there to, like, fix the data centers. Got it. So they'll focus on using robots for that. I guess that makes sense. Now, am I right that Google is actually planning to do some kind of like test launch within the next couple of years on this?
Starting point is 00:10:35 Yeah, they are planning to test this in 2027 by launching two prototype satellites in partnership with Planet, which is a company that sort of sends up these little tiny satellites into space for mapping and things like that. And that is their plan. There are also other companies, including StarCloud, which is also planning to send up some prototypes pretty soon. So they're moving forward with testing on this. I will say, I think this is probably not going to happen in any real way for at least a couple of years, in part because things are still very expensive to send up into space.
Starting point is 00:11:15 It is not right now economically feasible to send up a whole bunch of chips and a whole bunch of satellites up into space. It costs many times more than what you would need to build a comparable data center here on Earth. Yeah, and people here on Earth are saying that building the data centers that we're doing here on Earth are not economically feasible, right? So I can't imagine how much more out of control the costs are going to be once you leave orbit. One thing I thought was interesting in the Google blog post was that the company tried to play Suncatcher in the lineage of self-driving cars, so what is now Waymo, and quantum computing,
Starting point is 00:11:53 which hasn't quite become a mainstream technology yet, but has made a lot of strides. You know, just within the past year, we did an episode on it not all that long ago. And they're sort of saying, like, Suncatcher is kind of one of those, where we are willing to work on this for 8, 10, 12, 15 years to make it into a mainstream technology.
Starting point is 00:12:13 And so I took that as Google saying, like, hey, this is not just like some, crazy little experiment that a couple engineers are working on in their spare time. It seems like they're serious about this. I think they're serious about this, and I think they are looking out to a future five, 10, 15 years away
Starting point is 00:12:30 where kind of the demand for AI and AI-related tasks is just essentially infinite, right? It's like this is not something that 10% of people are using every day. This is something that 100% of people are using
Starting point is 00:12:46 constantly, that there are like sort of entire companies or sectors of the economy that have been sort of fully turned over to AI. And maybe that happens and maybe it doesn't. But if it does happen, we're going to need a lot of energy and a lot of data centers and we may run out of land and power here on Earth. Now, something that I did not realize until after I had read about Suncatcher is just how many other companies are looking at doing the same thing. Can you kind of give me a high level overview of like who else is playing here? And does it seem like anyone else is further along than Google is right now? Yeah. So as I mentioned, there's this company StarCloud, which is a Y Combinator startup that got some funding from Nvidia. They are sort of the main
Starting point is 00:13:34 ones here doing this. There's also a company called Axiom Space that is doing this. And we think that there are some Chinese companies, or at least one Chinese effort to do a space-based data center, although they've been a little bit vague about the details there. And then the information had an article about some comments that Eric Schmidt and Jeff Bezos have made suggesting that maybe they are also interested in or looking at doing something like this. Well, you know, Jeff Bezos just put Lauren Sanchez in the space. So you have to wonder if that was kind of a first step towards something in this vein. Yes.
Starting point is 00:14:13 You know, one thing I think that is interesting about this approach, Kevin, is that, as you know, we've seen an increasing amount of resistance from people in sort of local communities to having data centers put in their towns or near their towns. They're worried about how it's going to affect the cost of energy for them, right? They're worried about water usage or the environmental impact. And so I think that, you know, if this sort of thing comes to pass, we'll have gone from like, you know, just like the nimbies saying not in my backyard to this new group of people that I'm calling the noms that are saying not on my planet, you know, and they want all the data centers just built up in the sky. So do you think noms are going to become a sort of major political force? I do. Although I also think that eventually people may start to start to not want them in space either. But it's going to be harder for them to protest. You got to get in a rocket, go up there into low earth orbit. It's very inconvenient. Now, why wouldn't people want them in space?
Starting point is 00:15:10 Well, there are various people who think that this is going to create a lot of, like, space debris and things like that, that would eventually be bad. I talked to some folks who, you know, work on this stuff, and they were like, they don't think that's really going to be a big deal. There's all kinds of stuff up in space now. We generally don't pay much attention to it. But I can see this sort of sounding to people like Elon Musk, you know, proposing to build colonies on Mars or something. Like, it's just like, it's like too futuristic, it's too sci-fi, and it sounds like these very, you know, rich companies and individuals trying to kind of flee from their problems here on Earth by, like, sending stuff into space. Here's what I would say. I would love to be, like, living at a time when one of the top ten concerns I had in my life was space debris.
Starting point is 00:15:57 If I ever get there, Kevin, I will be in heaven. Heaven! Well, you'll be in low Earth orbit, technically. Exactly. Now, I have a question for you. Yeah, yeah. Would you go to space? Yes, absolutely.
Starting point is 00:16:13 Would you go to space to fix a data center? I mean, what is the salary for that job? Very high. I mean, there's probably a certain price for which I would do it. But here's the thing, you know, I'm not handy around the house. Yeah. It's like, if I, you know, if chat GPT doesn't know what to do, I'm calling the handyman. Yeah.
Starting point is 00:16:31 Okay. I will just say that I think we should make a, an offer to Google, which is if you guys get this project Sun Catcher up into Lower Earth orbit, we will do a podcast episode where we go up there and cut the ribbon. You're just dying to be exposed to massive levels of solar radiation.
Starting point is 00:16:49 You know, I just think it'd be fun. When we come back, the ball is in our court. Dean Ball talks about how we crafted the AI action plan. Well, Casey, recently we've been talking about some state-level AI regulations that have been passed and signed into law. But today we're going to have a discussion about now. national AI policy.
Starting point is 00:17:36 Yeah, I think that the states have been acting because the federal government has not really passed any legislation related to AI just yet. And that's left us with a lot of questions around how the administration has been thinking about AI. It's been a little confusing. I think especially, you know, in this administration, it has not been particularly clear to me what President Trump and his allies believe about things like whether or we are headed towards some kind of an AGI moment or how the federal government should try to
Starting point is 00:18:11 protect against some of the risks of very powerful AI systems. So the conversation that we're going to have today, I think will help us answer some of these questions and just kind of get a better sense of like what is happening in Washington, especially on the right, when it comes to AI and AI policy. Yeah. So earlier this year, Dean Ball spent several months working as the White House's senior policy advisor for artificial intelligence and emerging technology. He was brought into the White House in order to lead the drafting of the White House's AI action plan. And in that role in the White
Starting point is 00:18:44 House, Dean not only got to see how the AI policy sausage was made at the highest levels of government, he actually got to make the sausage himself. He was sort of responsible for taking all these different ideas from the various parts of government and putting them together into a document that would represent the administration's sort of official view on. AI. Yeah. And while he was there, Dean also got a good sense of who are the various factions on the right when it comes to AI policy. What do they believe? What are the competing incentives? Who has whose ear? And I think if you want to understand the likely path forward for AI regulation over the next few years, that's a really important part of the conversation. Yeah. So Dean left
Starting point is 00:19:27 the White House in August after the AI action plan was released. And since then, he's because, a senior fellow at the Foundation for American Innovation and the author of hyperdimensional newsletter about AI and policy. And because we're going to be spending a lot of time in this segment talking about AI, let's do our disclosures. I work for the New York Times, we're just suing open AI and Microsoft over alleged copyright violation. And my boyfriend works at Anthropic. Let's bring them in. Dean Ball, welcome to Hard Fork. Thank you both for having me. It's so good to be here. So how did you end up at the White House earlier this year working on AI policy?
Starting point is 00:20:08 What was your background before that? I was a think tanker. A lot of it was not tech policy. A lot of what I did was state and local policy. But I was always very interested in tech. And basically, when the AI policy conversation really took off sort of early 2023, I made the decision to start writing about AI. Basically, as a part-time gig, just like purely on the side.
Starting point is 00:20:33 wasn't being paid for it or anything. And then eventually I decided I really liked it, and I was finding my voice, and I was hired by the Mercatus Center at George Mason University to go spend some time there, spent about a year there, and then was recruited to the White House on the basis of primarily my writing on substack, and my substack is called hyperdimensional. It's where I talk about, you know, AI stuff. The substack to White House pipeline, I feel like that is, you are not the only person who has posted their way into a job.
Starting point is 00:21:03 in the federal government? You can post your way to the federal government. It's really true. And probably, I'm probably like a big chunk of it was probably my posts on X, really, which is maybe even more scary. But, yeah. So, okay, you get this call. You go to the White House.
Starting point is 00:21:21 What did you find there with respect to AI policy? Was there like a coherent single view of how AI should be, governed and regulated? I would say there are coherent intuitions, but the field is so nascent, and there haven't been a lot of fights where dividing lines have really firmed up yet. I think, by the way, this is true on the left as well.
Starting point is 00:21:52 I don't think that those intuitions have formed yet into like a lot of different sort of very specific policy positions. I don't think they've concretized. yet is really what I'm saying. I think though, you know, there's there's a combination of excitement and some worry and some confusion, probably equal parts, which is, you know, in a macro sense, that's probably roughly where I am too, actually, and that sounds about right to me. You say there were some coherent intuitions about AI in the administration. What were those intuitions? I think coherent intuition number one is AI is the most important.
Starting point is 00:22:33 technological, economic, scientific opportunity that this country, and probably the world at large, has seen in decades, and quite possibly ever. I think basically everyone shares the assessment. This is going to be extremely powerful, and it's going to be really important. And second intuition that directly follows is there are going to be some risks associated with this that are sort of familiar to us and things that are cognizable under existing sort of policy frameworks and others which might be more alien and might be like risks that we don't really even have concepts for as clearly yet and then you know maybe
Starting point is 00:23:14 the third intuition is regardless of those risks it feels like AI is going to play a very big role in the future of like American global leadership yeah that's really helpful and and kind of helps me get a sense of like the lay of the land when you arrived. I'm wondering if you can help me understand the kind of intra-right factions when it comes to AI, because I've, I think I've identified like at least two different views of AI that I've heard coming from prominent Republicans. And maybe you could call them like the David Sachs view and the Steve Bannon view. David Sachs, the president's AI czar, is constantly talking on.
Starting point is 00:24:01 online and on his podcast about, you know, these AI doomers who we think are sort of ridiculous and are overhyping the risks of AI and trying to sort of, you know, get their way on policy, calling them woke, implying that they're sort of trumping up these fears of, no pun intended, of job loss and things like that to sort of get their way when it comes to policy. Then there's Steve Bannon, who has been, you know, out there talking about the existential risks from AI. And you and I were both at this curve conference, actually all three of us were there a few weeks ago, where one of Steve Bannon's sort of guys was there and gave this very fascinating talk about how he thought, like, he was sort of in league with the so-called Dumers who believe that this could all go very badly very soon. Are there more views on the right than those two, are those sort of the primary camps? No, I think that there's a whole spectrum.
Starting point is 00:24:59 I can't speak for either David or Steve, of course. would put them on like roughly polar opposites in terms of how, you know, about how conservatives talk about this issue. But I think there's a whole spectrum in between. So first of all, you've got national security people. You're national security people who don't actually know a ton about, and this is, again, both sides here. You know, they're just, they think of this as a strategic technology that's important for U.S. competition with China and other things. And also maybe they think there's some national security risks, but they're not really thinking about, like, the domestic policy. They're not really thinking about regulation. They're not thinking EA versus
Starting point is 00:25:38 Dumer. So that would be one. I think also, you know, related to the sort of Bannon viewpoint, but maybe, you know, more toward the middle would be like people that are worried about kids safety primarily. There's a lot of conservatives who would distance themselves from the AI Dumer view, but who would also distance themselves from the pure accelerator. vision, and they would use the lessons we've had with social media as an example. So sort of that kid's safety viewpoint, for these people, very often the issues of things like LLM psychosis, of course, teen suicidality with chat pots being another very salient issue for this group, for everyone, I hope.
Starting point is 00:26:22 But yeah, there are others in between, and I guess I would put myself somewhere in kind of the middle and a weird, weird fusion. Where does industry fit into that spectrum? Like, my sense from the outside is that industry groups and lobbyists have had a lot of success in this administration in getting what they want. Where are they in those conversations? I think it really depends on incentives. People in policy conversations very often will refer to, like, industry as being this
Starting point is 00:26:54 kind of monolithic, coherent entity. It's, of course, not. And there's different people that have different incentives. So, you know, if you're a U.S. hyperscaler, you don't hate the export controls, you know. You don't want more competition for the same chips that you're trying to buy. Meaning like Microsoft or Google or an Amazon. Yes, Microsoft, Google, Amazon Web Services, et cetera. You don't hate that because, like, A, you don't want Chinese firms competing for your chips.
Starting point is 00:27:21 But even if it's not the same chips you're competing over, you don't want to be implicitly competing over space at TSM, to make the chips. So, you know, hyperscalers, you know, they will definitely have, like, nuanced positions on export controls, but by and large, like, their incentives are not to hate them, and they largely don't. Um, Frontier Labs, I mean, they want to make money selling tokens to people. So they want access to chips. Uh, but, you know, I think there's some people who believe, and it's from a political theory perspective, it's not wrong to believe that, like, ultimately they want to create moats. And I think there's a lot of ways you can make moats. It seems to me like the main way they're trying to make moats right now is through infrastructure, that they've
Starting point is 00:28:04 basically all come to the Anthropic today and asked a $50 billion commitment to build their own data centers. Google obviously does this. Open AI does this through Stargate. Meta does this. XAI does this. Everyone does this. Everyone's building infrastructure. And the basic view is like, well, the models maybe are not your moat per se, like the parameters of the model are not your moat, but perhaps the infrastructure is. And so, you know, these are all competing interests and no one's making illegitimate arguments here. Everyone's operating from incentives. And, of course, the job of government is to sort of solve for the equilibrium.
Starting point is 00:28:35 Dean, is there a MAGA view of AGI? Not yet. No, not really. I don't know that there's in any political persuasion view of AGI. I think MAGA might actually be the closest to having one. And I think it's at the moment, maybe the persuasion. at least from what I see online is, like, maybe it's sort of more dumery. I believe we saw a bipartisan bill introduced over the past week that would require
Starting point is 00:29:00 reports of job losses due to automation, which suggests that there is some increasing attention to that likelihood. Yeah, well, I mean, so there's this big question, you know, in the AI field, like places like the curve and places like a Lighthaven, there are these gatherings of various sort of doyens of the AI community, and they get together. And the main question that people talk about is like, when are the pitchforks going to be out for this technology and what is going to cause the pitchforks to come out? And I have come to the conclusion that rather than it being a singular issue, it's going to be this kind of miasma of issues. It's going to be like, you know, it's sloppification.
Starting point is 00:29:46 It's not safe for kids. It's driving up your electricity prices. It's using all the water. It's taking your job. And it's taking your job, and also it's going to kill everyone, and also, by the way, it's fake. It'll be all those things and kind of this weird sort of vishy swath. The aspect of the AI action plan that I find the most annoying is the attention on the ideology of the chatbots and the suggestion, you know, that they should be able to, you know, respond in some ways, but not in other ways. Could you kind of illuminate the discussions that we're being had and what the administration actually wants out of these models?
Starting point is 00:30:28 Yeah. So I think the main point here, first of all, like the most important thing, you're talking about the woke AI executive order. Yeah. What it is, so it's traditionally phrased. This is an executive order that deals with federal procurement policy. In other words, this is not an executive order. It's not a regulation on the versions of AI models that a company like Anthropic or Open AI or any other company ships to consumers or private businesses. This is purely about the versions of their models that they ship to the government.
Starting point is 00:31:05 And the government is saying in this case, we do not want to procure models which have top-down ideological biases. engineered into them, we would like our government employees to have access to models which are, you know, I think objective is a really hard word. Obviously, we've been, like, debating about, like, what is truth for, you know, since there was language, right? So I don't think we're going to resolve that. I have a feeling the General Services Administration guidelines will not resolve that issue. You know, I think it's folly to even try. And I think the executive order doesn't try, you know, the executive order steers clear of doing so. The executive order says, Instead, you just, we don't want you as the developer imposing some sort of worldview on top of the model.
Starting point is 00:31:57 Well, good luck with that, I guess. Well, I want to ask one follow up on that because my sense is that, you know, the Trump administration and Republicans in Congress have been very upset with how the Biden administration sort of jawboned, how they applied pressure to social media companies to take down, you know, misinformation or what they considered misinformation about the COVID vaccines or things like that. That was seen as like very inappropriate. In fact, they're like ongoing investigations of the contacts between the Biden White House and the social media companies over this issue. Yes. And then we turn around and we see this like woke AI executive order where it's like, I understand the subtle point you're making about, you know, this is not regulating the models that the companies are releasing to the public. It's just the ones that they're selling to the government.
Starting point is 00:32:47 but like we all know that there's there's one set of models right and they get they get built and they get sold to various customers and I think you know it's reasonable to see that and think okay this is the Trump administration doing exactly what it got so mad at the Biden administration for doing which is to contact the tech companies and tell them hey this is how your product should be working this is the kind of things that should be allowing and not allowing and I don't No, no. Does that seem at all to you, hypocritical? Well, so, look, I think that there is an inherent tension here, and this is a tension that has existed on the right, and it's particularly existed sort of post-Trump 45, post-President
Starting point is 00:33:29 Trump's first term. There is this argument that exists of should we stick to our principles that the government shouldn't be doing this kind of job owning, or should we accept that the government has this power, and now we need to throw it back? at the left, right? And I can tell you that I personally have always definitively been on one side of that argument, which is the form review. We should stick to principles.
Starting point is 00:33:56 We should not fight, we should not. No job boning from anyone. Yeah, you shouldn't do that. I mean, like, you know, you shouldn't do that. At the same time, I think the government totally has a right to say, and again, what we're talking about here, like I wouldn't think of this as like a model training thing.
Starting point is 00:34:13 I would think if this is the sort of thing that can be relatively, like trivially easily changed by the developer, right? So models that are sold to the government already have compliance burdens that are significantly higher than this executive order, right? They have to comply with the Freedom of Information Act.
Starting point is 00:34:30 They have to comply with the Presidential Records Act if they're sold to the White House. There's all sorts of data stewardship laws that are way more difficult than anything in the Woke AI executive order. The WOKI executive order basically says, like, you need to disclose in the procurement process to the agency,
Starting point is 00:34:46 from whom you're procuring, you need to disclose, like, what the system prompt is. You can change a system prompt for a specific customer. It's not that hard. And I would only point out that, like, I will just say it here right now, that, like, if you did try to use federal law to compel a developer to change the way they train the models that they serve to the public, that is unambiguously unconstitutional. It is a violation of the First Amendment. You are violating that company's speech rights, and you are violating the American citizen speech rights who might use that model.
Starting point is 00:35:21 So it would be quite dire and grave for the government to do that, and I am confident that the woke AI executive order was not intended to do that. So, Dean, I really enjoy your newsletter. I've been reading it since before you joined the government. I continue to read it today. And one point of view that you advocate for with great frequency. is that most, if not all, AI regulation should be done at the federal level. And you spend a lot of very valuable time looking into how states are attempting to regulate AI in ways that I think you believe are mostly bad.
Starting point is 00:35:59 Could you kind of give us a high-level overview of your interest in this subject and what you see states doing that concerns you so much? Yeah, so I come from a state and local policy background, I should say. And so, like, my view is that a lot of the real governance in this country happens at the state and local level. And I mostly, now that I live in D.C., I mostly say, thank God that that's the case. That being said, there are some things that inherently implicate interstate commerce. And I think that models which are trained to be served to the entire world, which cost a billion dollars to train, that the standards by which those models are trained and evaluated and measured, you know, I think those have to be federal standards because you can't have competing standards.
Starting point is 00:36:47 Now, maybe we don't end up having competing standards. Maybe what happens is the biggest state regulates, and that happens all the time in America. There's many, many technologies where the state of California or the state of New York or somewhere like that, Texas sometimes, has an implicitly federal effect, one state doing lawmaking. I think that's a failure mode. I think it's an issue of our, a structural issue of our constitution that the founders couldn't really possibly have contemplated because like the notion of economies of scale didn't quite exist for them. And so I think it's a really, really difficult issue of Supreme Court jurisprudence. Right now it's the case that California by default is the central regulator of AI in America. Thus far, I think they've done a better job than I would have guessed, but still not a great job.
Starting point is 00:37:38 So I was, you know, broadly supportive of their flagship AI bill from this year, which was called SB 53. It is a transparency bill that applies only to the largest developers of AI models. And to me, it seems rather reasonable overall. Let me bring it back to maybe some more like contemporary AI concerns, though, which is, you know, earlier when you were describing some of the kind of, you know, landscape in Washington and who's concerned about what you mentioned, there's this group of Republicans who are very concerned about. chatbot psychosis, child safety, teen suicidality. Those are all harms that are present today that seem to be encouraged on some level by products that are out on the market. And we have a Congress that is very loathe to pass really any regulation at all when
Starting point is 00:38:27 it comes to the tech industry, whether that's for ideological reasons or just logistically, it's very difficult to get Republicans and Democrats to agree. Or the government's shut down half the time. That's also been increasingly an issue. And so in such a world, I can very much understand the point of view of a state lawmaker who says, well, I don't want the kids in my state to kill themselves. Like, we're going to do something about this right now. And we're not as dysfunctional as the federal government. So we're going to get in there. And we're going to try to do something. So how do you view that dynamic? And is your desire truly that the states would just say, hey, we're not going to get involved. And that's on Congress. No. So, I mean, look, I understand the incentives of the state lawmakers, like, for sure. I think Congress needs to act. Like, my, my view is more proactive. My view is like, Congress needs to deal with this. This is a problem that Congress needs to deal with. I don't blame the state lawmakers. I blame, sometimes I do. Sometimes I blame them for poor statute drafting. There's no excuse for that, right? Your job. Like, and I say this sometimes to legislators. And they're like, well, we'll let the courts figure that out. And I say, no, you took an oath to the Constitution too, not just the judges. But in the the general case of like, I want to protect kids in my state. No, of course I don't blame them
Starting point is 00:39:36 for that. Yeah. I want to zoom out a little bit and ask a question about AI and polarization. It feels to me right now like AI is kind of in this weird, confusing, pre-polarized state. Like there's this sort of machine that sort of, when an issue gets important enough or salient enough people, it kind of gets run through the polarization machine. And like it comes out the other side and, like, Republicans take one position and Democrats take another position. Do you think something similar is going to happen with AI where, like, it will become very predictable
Starting point is 00:40:10 which view you hold on AI based on which party you vote for? I think what's more likely is that over time, it splinters, and there's, like, different things that people talk about. So there's going to be data centers, and there's going to be, you know, China competition that'll be an issue,
Starting point is 00:40:28 and there'll be, like, the software side, regulation, there'll be the kids issues. Just like today, you know, we don't talk about computer policy or internet policy. We talk about internet policy used to be a thing. In the 90s, internet policy was a thing. But now it's like social media, you know, privacy, whatever else. I think it'll splinter in that way. Will those issues themselves be polarized? Yeah, I mean, in some ways they will be. Yeah. I do hope, though, that there's certain parts of, and this is a very important part of, you know, the action plan in my view, too. The action plan, like, not every single aspect of an issue has to be polarized. There are legitimate tail risk type
Starting point is 00:41:07 events, national security issues that I think it is the obligation of the federal government to deal with in a mature and responsible way. I've heard Ezra Klein before, I love this turn a phrase of his. Who I've never, I've never heard of him. Yeah, we're not familiar with his work. Yeah. as a i've heard him describe government as a grand enterprise in risk management i think that's true in a fundamental sense i think that's very true and um so you there are certain things that we just do need to deal with and the action plan tries to make some incremental progress on some of those things and of course there's a lot of things we need to do to embrace the technology and let it grow
Starting point is 00:41:45 and all that too and i think that's an important part as well but that's less controversial to say as a republican um i think the maybe more controversial thing right now to say is like yeah there are like legitimate risks. And I hope those things can be bipartisan. The dealing with those risks can be bipartisan because really like if we can't deal with catastrophic tail risk, then we do not have a legitimate government. Like the whole point of government is to deal with this issue. And we should just, as Michael Dell said about Apple in the 90s, we should throw the thing out and return the money to the shareholders if like if we can't manage these things. I really do believe that. So let's talk about that point specifically. When I look at AI policy in America today, I mostly see the big
Starting point is 00:42:33 frontier labs getting just about everything they want, right? Like, it seems like there is a high degree of alignment between the labs and the government. And when it comes to, like, safety restrictions, for example, I don't see a lot that is holding them back from, you know, building their next two or three frontier models. So there are components of the AI Action Plan that are meant to address some of those catastrophic risks that you mentioned. Tell us how you envision that actually working. Where is the moment where the industry stops getting everything that it wants?
Starting point is 00:43:10 Well, I would say there's so much you can say here. I think the first thing is that many of the people who work at the frontier labs, I can't speak for the labs, of course, but knowing a lot of them personally, including up to very senior levels, I can say that they have an earnest desire to deal with these problems. And they invest real resources as companies. And part of the reason they do that is because they have incentives, because their companies would be bankrupt if they, e.g., caused a pandemic. Right?
Starting point is 00:43:39 And the other thing is that, like, a lot of these problems are super tractable. Like, we don't have to act as though these things are, like, the hardest. problems we've ever dealt with to me as someone with experience in public policy and by the way this is the posture of like people that I met in government who are 30 year veterans of thinking about tail risks to them you bring up like AI bio risk or AI cyber risk and they're like yeah sounds like a serious risk okay there's a hurricane that's tracking toward Florida let me go deal with that right like um these things come across your desk every day when you're in government these are eminently tractable problems in the near term with current technology and
Starting point is 00:44:19 technology that I think we're going to have in the near future, without spending a ton of money, there's a lot of traction you can get on them that doesn't involve really in any meaningful way slowing down AI development. I want to push back that there's this trade-off between sort of mitigating tail risk and slowing down AI development. Now, will that always be the case? No. At some point, there will be trade-offs. We'll have to make those trade-offs and they'll be hard, and it's like hard for me to know where I'll come down on that because it'll depend on the particulars. But right now, we have this great opportunity of like, oh, we can accelerate AI development and we can also have better biosecurity, which, by the way, was a problem before ChatGPT existed. There was a whole
Starting point is 00:44:59 pandemic about it. So, like, yeah. Sometimes I talk to people who work on AI policy or just, or just, you know, work on AI and think about policy. And they'll say things like, you know, I don't think we're going to get any meaningful AI regulation until there's a catastrophe. for you. Do you, Dean, think that it will take something like that to really catalyze significant movement on AI policy in Congress? Possibly. I mean, like, I can't say that, like, that certainly a catastrophe is plausible and could catalyze movement in Congress, for sure. I think there are other ways to achieve this. I really do. Like, I think you can make incremental advancements in the absence of a catastrophe.
Starting point is 00:45:49 Now, it depends on, like, a lot of people in the AI safety community will say this, or people that are at labs who care about AI safety also. They will say this. That's, like, a very anthropic type of position. And I don't say that as a pejorative, by the way. To be totally transparent. Like, I've heard this from people at lots of different labs where they're sort of like, yeah, I don't really think, like, we're capable.
Starting point is 00:46:15 of, and it's not so much a knock on like this particular Congress or anything. It's just like, I don't think the government is capable of regulating things in advance. I am okay with government being in a mostly reactive posture, particularly with respect to things that aren't tail risk. Tail risks are the one exception because you, you know, those things can be very, very damaging and so you want to do some stuff in advance to mitigate that. But when it comes to like most other harms from AI, I'm comfortable with government just really reacting to realized harms in areas where it's like, okay, well, it's a realized harm that we've seen. We think that's going to continue happening. It doesn't appear to be resolved adequately by the existing system of common
Starting point is 00:47:01 law liability that allows people harmed to sue the people who harmed them. And it can be meaningfully addressed through a targeted law. And if all those conditions are satisfied, then we should totally pass that law. I think kid safety is in this category. Yeah. Yeah. Well, Dean, thanks so much for coming. Really fascinating conversation. And people should check out your writing. Your website is hyperdimensional. It was a real pleasure, guys. Thank you. Thank you. Thanks, Dean. When we come back, we'll have more to say about the Canadian fur trade than we've ever said before. It was not the Canadian fur trade. It was the upstate New York sugar trade. They're related in ways I don't understand.
Starting point is 00:48:02 Well, Scooby gang, it's time to get in the old mystery machine, because today we've got a mystery. That's right, gum shoes. grab your notebook and your magnifying glass because there are a few clues and we're about to crack the case wide open. And this one is a history mystery. It involves an experiment that a historian ran using an AI model. And we're going to talk about it all with the historian in just a second. But Casey, to set the scene here a little bit, there are a lot of rumors going around right now about this new Google Gemini 3 model. There really are. Gemini 2 came out almost exactly a year ago, came out last December, and while Google has updated it throughout the year, we have been hearing an increasing number
Starting point is 00:48:51 of whispers this fall about Gemini 3 and rumors that it really is pretty great. So Alex Heath reported a few weeks back that he expected Gemini 3 to come out in December. And one thing that happens in the run-up to their release of new models is that companies quietly test them, And that brings us to our story today. Yes. So Mark Humphreys is a history professor at Wilfred Laurier University in Ontario, Canada. He does research involving a lot of old documents and trying to decipher the handwriting on these documents. And he is also kind of an AI early adopter.
Starting point is 00:49:27 He's got a substack called Generative History, where he's been writing about his experiments using AI to solve some of his research problems. And recently he had a post that really caught our attention. called, has Google quietly solved two of AI's oldest problems in which he explained a really fascinating experiment that he ran using one of these kind of test models inside Google's AI studio, which is a Google product where you can kind of experiment with different models. And he says that the responses that he got back from this mystery model made the hair on the back of his neck stand up.
Starting point is 00:50:04 Like this was so astounding to him, not just because they were very good, but because because they seemed like a different kind of capability than ones he had seen in any other AI model. Yeah, and so the mystery is what model was Mark using, but I think the bigger story is what does it mean that this historian was as impressed as he was with this very unusual thing
Starting point is 00:50:27 that he found a large language model doing? Yes, and we should say, like, it is very hard to determine exactly which model anyone is sort of being shown at any given time, the way these pre-release tests go. Companies will show 1% of users one model and another 1% of users a different model and kind of ask them to compare the two.
Starting point is 00:50:49 And they give them weird code names. They don't tell you what you're using. Exactly. So there's still some uncertainty around this. This may have just kind of been a one-off. We will obviously need to see what Gemini 3 actually does when it comes out. But for now, I think this is a very interesting story
Starting point is 00:51:02 because it points to the way that these AI models are starting to do things that surprise even expert. it's in their fields. Yes. And so for those reasons, it's time to bring in Mark Humphreys and talk about what he found. Kevin, you know the difference between an American and a Canadian historian? What's that? Canadian historians process data while American historians process data. Is that true? Yeah, that's true. Well, let's talk to Mark, and he can pronounce it however he wants. Hell yeah, brother.
Starting point is 00:51:38 Mark Humphreys, welcome to Hard Fork. Thanks for having me. Where are we catching you today? Are you up in Canada? What's going on up there? I am. I'm in Waterloo, Ontario, in Canada, in my office at the University of our Wolford Lower University. So Waterloo, so you must just be surrounded by AI computer scientists at all times.
Starting point is 00:51:59 There are a lot of startups and a lot of AI researchers and a lot of computer companies in Waterloo, yes. Home of the Blackberry. That's right. That's right. Yes. Rimpark. So before we get into the specifics of your most recent brush with this new mystery AI model,
Starting point is 00:52:15 can you just tell us how you've been using AI in your history research over the last year or so? Sure. So my research partner and I, Leanne Letty, who's lab this all comes out of as well, have been working on trying to develop ways of processing huge amounts of data, mostly handwritten, related to the fur trade. And that involves a couple of things. It involves trying to recognize the handwriting accurately, but it also involves trying to basically generate metadata for all of,
Starting point is 00:52:46 you know, tens of thousands of records to try and understand what's in those records and make connections between them. So we're kind of operating a task that are kind of at the, just at the threshold of what AI models are capable of doing. So it's been kind of interesting to watch over the last couple of years, the models get better and become capable of doing some of these things and then finding out new limitations as we go along. Yeah. And tell us a little bit about the kind of work that you do in general. I know you're really focused on using older documents in your work. What kind of stories are you trying to put together?
Starting point is 00:53:18 Yeah. So, you know, I've always been really interested in stories of ordinary people. So in the fur trade, when you're trying to understand, you know, what happened to ordinary people in the 18th and the 19th centuries, the problem is many of them were literate, didn't write. And although they kind of appear in a lot of documents, that are generated in the kind of course of living. You know, these are marriage, death records, account books, stuff like that. It's a lot of detective work. It's a lot of trying to piece together stories from fragmented documents, what somebody bought in one place, a contract they signed somewhere else, a baptismal record somewhere else. And so a lot of this is trying to do that,
Starting point is 00:53:53 and that's what Dr. Letty and I've been trying to do with our graduate students, is to try and piece together what these stories about ordinary people can tell us in the fur trade and in the western part of North America. from kind of the period of about 1760 through until the early 19th century. You know, it's interesting, Kevin, because every time I go to a Starbucks and they try to give me a receipt, I think I don't need any paperwork about what just happened here, right? I'm just going to take my mocha and get out of here. But what Mark is saying is that that document could be of huge value to a future historian
Starting point is 00:54:21 in understanding our lives. Exactly. Yes, they will want to know. Let's get into it. So, Mark, tell us about this experience that you had with Gemini, the AI model that you were trying to use for this transcription, basically taking this very old document about the fur trade and plugging it in and saying, transcribe this. Tell me what this says. Yeah, so, you know, I think to understand why this is kind of a significant or looks like it could be a fairly
Starting point is 00:54:46 significant development. It's important to understand kind of where we've come from in the last two years on this, right? So when GPT4 first came out in 2023, it could kind of sort of read handwritten documents. It would be mostly errors, but you could kind of see that it was beginning to be able to do this. And it's been really easy for kind of companies and systems to get up to about 90% accuracy. And then everything above 90% has been pretty difficult. And the problem is that that last 10% is the most important part, right? So that if you're interested in people's names, you're interested in amounts of money,
Starting point is 00:55:21 you're interested in where they were, you've got to get that stuff right in order to make it useful. And up till about, I guess, you know, when Gemini 2.5 Pro came out, back last spring. We were kind of still in that era. And Gemini 2.5 Pro got up to about 95% accuracy. And that's really good. So what I was interested in is when we began to see reports kind of on X that there were new models being tested by Google in AI Studio, which is their kind of playground app. I was just curious, how much better with this gag? So, okay, you are hearing these rumors that there's this new mystery model inside the AI studio that Google tests new models in before they're released. What do you do?
Starting point is 00:56:09 Yeah, so we have a Dr. Lettie and I have a corpus of 50 different documents that we've been using to benchmark kind of how these models improve over time. They're all documents that we are pretty sure not in the training data because we've either taken them ourselves or they've been kind of from sources that are not typically online. And you can't be 100% sure, but it seems to be the case. So I started to put a few of those documents in, and for your listeners who are not maybe aware, the way that the testing of these types of models often works, is you kind of have to put in the document dozens of times before you get the hit on the model you're hoping to test because it kind of randomly pops up. So it's not an easy thing to do. I managed to test
Starting point is 00:56:49 about five of our 50 examples, about 1,000 words, and the results were impressive, to say the least in the sense that the error rate again declined by about 50% from where it had been with Gemini 2.5 Pro. And it got to about a 1% word error rate, which means every one in 100 words, obviously, you're getting wrong. But that can include capitalization, errors, punctuation, stuff like that. So that in itself is really significant. No models come close to that. Human experts who do transcription for a living offer about a 1% error rate. So that itself is fairly important. And your sense of having used this new experimental model, did that just come from
Starting point is 00:57:32 you're inputting dozens and dozens of queries? And every once in a while, you would just get a result that was radically better than the others. And you thought, aha, I must be getting the new one. Or were there any other signs about what Google was showing you? Well, it's A-B testing. So what that means is normally in AI Studio, you put in a query and you get a response. And when you get the A-B test, you get two responses.
Starting point is 00:57:56 And it asked you to rate which ones better, right? And the labs do this in order to get feedback on, you know, is the model actually better on specific types of tasks than other ones, right? So you might have to do that 20 or 30 times until you get one of those two responses. And then the differences were pretty notable. So you said the overall error rate fell by about 50%. But that was not actually what impressed. you the most about this new model. What impressed you the most? Yeah, so first of all, that
Starting point is 00:58:29 was, you know, impressive. And then I was curious, okay, if it's gotten to this point, how is it going to do on tabular data? And as historians, one of the things you work with, you know, to go back to your Starbucks example, are receipts and ledgers that come from, you know, merchants in the past. And a lot of that's fairly boring. But if you want to know where somebody is, where they bought their coffee one morning and you want to trace that person's movements, you can use these types of documents to do that. You can see what they bought and all of those types of things. The thing is, to this point, models have been pretty bad on tabular data. It's often very, it's kept kind of like a cash register receipt system is kept, so it's kind of just on the fly.
Starting point is 00:59:06 Nobody's expecting people to necessarily read it down the road. So it's difficult to interpret just by looking at it. It's also sometimes quickly written, so it's even worse handwriting than people are used to. And because it's historical documents, in this case, I'm dealing with records from 18th century New York State, upstate New York, in Albany. And those records are written in pounds, shillings, and pence. So that's the old, it's a different base than we're used to using, in which you have basically a different form of currency measurement. And so when I dropped in a page, just kind of at random from this ledger, I was just curious to see what I get back. And suddenly, it not only came back in a near perfect transcription,
Starting point is 00:59:50 which itself was kind of remarkable given how difficult it is to make sense of what's actually on the page, but as I started to go through it, I was looking for errors, as trying to find errors. And I began to realize that some of the things that I was seeing on there that looked like errors were actually clarifications, and they required the model to do some really interesting things. Give us an example. Sure. So in the actual ledger document, right, what we're dealing with is a series of kind of entries
Starting point is 01:00:14 that are made in a daybook. So this is as people come into a store, they're buying things, and it's being recorded just like on a cash register sheet. And in the one case that I was in particular looking at here, what it basically says in one of the entries is Samuel Slit came in on the 27th of March, and it says to one loaf of sugar at 1445 at 140191. And what that means when you actually break it all out is that this guy named Samuel Slit came in. into the store. He bought one loaf of sugar. If you're not aware in the 18th century, sugar comes in hard conical shapes and they break off pieces and they sell it to you. And it says one four or five sold that one shilling four pence per pound. And then the total is zero pounds, 19 shillings, and one pence. And this is the old kind of notation, right? And what I saw in the actual model's
Starting point is 01:01:10 response, though, was that it had figured out that in fact it was one loaf of sugar measured out, measured out at 14 pounds of sugar five ounces sold at one shilling four pence and then for the total right and what's insignificant about that is that in order to figure out that what was written on the page just random number one four five to figure out that that was 14 pounds of five ounces the model had to be able to work backwards from a different currency system with a different base the thing that makes that important is that models shouldn't be able to do that right that these models are basically the way they're trained is in pattern recognition. What they're trying to do is they're trying to predict the next token. And so the first problem here is that predicting
Starting point is 01:01:54 numbers is actually very difficult for models to do, right, in the sense that the model has no idea whether Samuel Slit is buying 14 pounds, five ounces or 13 pounds six ounces, right? I mean, that's a random number effectively. It's not probabilistic. The other problem is that although there would be, you know, a lot of material in the training data that would be relate to this kind of old currency system, the reality is there's not that much of it in terms of the actual percentage of the material is there, because there's so little of this that's out there in terms of the, you know, the overall sum total of all the records that exist. And so when we're thinking about it, the model is having to do some interesting things there. What it looks like to me is
Starting point is 01:02:37 it's a form of symbolic reasoning. I have to know in my head that I'm dealing with different units of measurement, which don't have a common kind of a base pair to multiply or divide by. And then I have to kind of abstractly realize that these units of measurement do, in fact, they are comparable as long as we do some conversions, and we have to then move them around in our heads to figure out. This is something that I had to think about it for a second and realize, in fact, the model had done something that was mathematically correct and unexpected. So what are the implications for you in your work of a model being able to do this kind of abstract reasoning? Yeah. And so as an historian, what it means is that assuming that this replicates once we start to see the actual model come out,
Starting point is 01:03:28 you're going to be able to trust the models to do a lot of stuff that historians would normally need to do. Right? So it's one thing to transcribe a document. It would be another to say, here's a ledger, go through and add up all the sugar that was bought and sold in this ledger. And right now, you can't trust a model to do anything like that, right? You can't trust it to necessarily recognize sugar. You can come up with quantities, do that type of math. If we're getting to a point where models can begin to do that, you can begin to get them to do tasks that would take humans a very long time. Right. It sort of sounds like the equivalent of the moment where like AI coding tools. tools went from being a useful assistant for a person who's a professional programmer to like actually being able to go out and program things just on their own with very minimal instruction. It's like that for history, right? Yeah, and I think that's a really good example. But I think that the interesting thing about history here is that I think it's a very typical kind of knowledge work kind of area, right? In the sense that a lot of the stuff we're doing is
Starting point is 01:04:32 pretty esoteric and your listeners will probably be wondering, you know, who's really interested in how much sugar people bought in Albany in the 18th century. Well, Casey is, but he's a special case. Yeah. Yeah, that's fair. I'm really interested in this Samuel slit and why he needed 14 pounds of sugar. Like, take it easy, Sam. It's true.
Starting point is 01:04:49 Well, he's a merchant. He also wants to go and sell it to other people, right? Oh, he's a dealer. There we go. He is, a sugar dealer. Yeah. But the interesting thing about this, I think, right, is that the stuff we do as historians with these historical records, this is what all knowledge work people do, right?
Starting point is 01:05:04 is that you take information and you synthesize it, you take it from one format, you put it into another, you realize the implications of the things that you're reading, and you draw conclusions and analysis based on that, right? And it can be 18th century sugar, but it can very easily be any other kind of widget that a knowledge worker uses. So what I'm seeing turning on here for historians is highly likely to start turning on in other areas as well,
Starting point is 01:05:31 that up to this point in the models, you know, we've been getting the sense that they're starting to get good enough where you can feel like, yeah, I think I can trust the outputs on this, but you're getting to the point where it just, it works. And as somebody who uses coding assistance all the time now, that is, it's a very similar situation where you used to have to cut and paste back and forth and it would, you know, it would never run the first time. You'd have to run it, you know, three or four times, pace the errors back and forth and eventually it would work. And now you can just kind of hit the button and it almost always works, right? And that's what we're going to see here with knowledge work.
Starting point is 01:06:01 So I want to zero in on what makes this so interesting. So we don't know at this moment that this is Gemini 3, but I think Kevin and I feel like it's highly likely to be Gemini 3, right? And we also don't know a lot about how, if it is Gemini 3, exactly how it was trained. But I think we can assume that it was trained in a way that its predecessors were, which was in part by just feeding it lots more data, lots more compute, right? Just sort of following the scaling laws. And there's been so much debate over the past year about are we seeing diminishing returns, right? Have we sort of figured out the limits of what we can get out of these scaling laws? The story that you're telling us, Mark, is a suggestion that, no, we have not gotten everything that there is to be gotten out of this increased scaling. And in fact, we should expect to see continued emerging properties from this ongoing scaling. And you've just given us an example of it right there. So that's why I think this is so fascinating. Yeah, and I was fascinated by this experiment. And I wanted to see if I could actually get to the bottom of what happened here.
Starting point is 01:06:59 So I asked some folks who would be in a position to know, like, hey, there's this, you know, history professor in Canada. He thinks he, like, stumbled onto this, like, unreleased Gemini 3 AB test, and it was really good. And they said, lose my number. No, they were very tight-lipped. They did not want to talk about it. They are keeping things very secretive over there. But I was able to confirm that Google does test new models in AI Studio. before they sort of appear elsewhere.
Starting point is 01:07:32 And so I think if I were a betting man, it's a pretty good bet that what you experienced was, in fact, an unreleased model, probably Gemini 3. So, Kevin, I have not been in the AI studio myself recently to see if I could try this model. Have you made any efforts to try to access whatever this model is?
Starting point is 01:07:55 Yes. So I use AI Studio. People don't know this, But, like, Google has, like, you know, 800 AI products, right now they're like, you know, a billion ways to use Gemini. And the most effective way, the best way to use Gemini is inside this product that basically no one except, you know, developers and nerds like us uses, which is called the Google AI Studio. And if you go in there, I don't know, for whatever reason, Mark, do you find this too? But, like, the model, like, the version of Gemini in AI Studio is better than the one, like, on the web. I don't know why.
Starting point is 01:08:30 But this is something I'm consistently able to get AI Studio to do things like transcribing long interviews that the regular old Gemini won't do. So anyway, I was in there this morning actually doing some research for our segment about SunKetcher, this like Google project about AI putting AI stuff in space. And I was trying to have it summarized this research paper and give me some ideas in comparisons to what other companies are doing. And I got this A-B test, this like, you know, choose between these two answers. And I am looking at it right now. It says, which response do you prefer? It has these two side-by-side things. And they basically both look pretty good.
Starting point is 01:09:10 I think the problem I'm identifying, Mark, is that unlike you, I am not smart enough to come up with, like, problems that are challenging enough where the difference between one pretty good model and a very good model is readily apparent. So maybe you can help me with that. Well, I mean, here's an idea. I know Mark really focuses on the 17 and the 1800s in the fur trade. What about the 1500s? I bet you can make a debt. Yeah. Well, I will, I'll look into that. All right. Well, totally fascinating experience. And I can't wait to hear more about what you're doing with AI and history. This is a really interesting mystery that I hope we've shed some light on. Thank you, Mark. Thank you very much for having me. I'm going to be able to be. Hard Fork is produced by Rachel Cohn and Whitney Jones. We're edited by Jen Poyan.
Starting point is 01:10:36 Today's show was fact-checked by Will Pysholt and was engineered by Chris Wood. Original music by Alicia Boutoup, Marion Lazzano, and Dan Powell. Media production by Soya Roque, Pat Gunther, Jake Nickle, and Chris Schott. You can watch this whole episode on YouTube at YouTube.com slash Hard Fork. Special thanks to Paula Schumann, Pueuing Tam, Dahlia Hedad, and Jeffrey Miranda. You can email us at Hartfork at NYTimes.com on what else you think we should build in space. Thank you.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.