Hacked - Google Search Leaks

Episode Date: July 2, 2024

Few things impact the shape of the internet more than Google Search, yet its inner workings are mostly a mystery. In May, Rand Fishkin received alleged leaked documents that peal back the curtain as t...o how it works. We speak with Rand Fishkin about his involvement in the Google API leaks. Learn more about your ad choices. Visit podcastchoices.com/adchoices

Transcript
Discussion (0)
Starting point is 00:00:00 In May of this year, Rand Fishkin received an email from a leaker. And I had never emailed with him before. I didn't know who he was. The email was labeled confidential Google. Rand is a public figure in internet marketing, and this leaker was claiming to have access to internal documentation about something very important and up until this point, very secretive. These are wild accurate. I don't know if you agree with this, Scott, but maybe no one thing impacts the shape of the modern internet more than Google search. If you remember the internet before Google, you would strongly agree with that sentiment, especially if you use some of the alternative search engines that used to exist.
Starting point is 00:00:50 I remember asking Jeeves. I forgot about Jeeves, actually. Right? A lot of people forgot about Jeeves. Today, Google is over 80% of search traffic with 8.5 billion searches a day, two trillion annually. There's just nonsense numbers at this point. 100,000 a second. The average person listening, you probably Google more times in a day than you eat meals. What Google serves dictates what information people consume.
Starting point is 00:01:18 How Google ranks websites dictates what websites live and die. Optimizing for that system is now an entire industry. And for a long time, all we've really known about that system is what Google tells us. Kind of for good reason, because you don't want it to get any more gamified than it already is. But really, everything about what Google tracks and doesn't track, what data they use and what they don't, that all comes from statements from Google. Their PR people, their executives, you know, what we know about Google search is what Google has told us. There have never been any documents about the API.
Starting point is 00:01:58 to confirm or importantly contradict those public statements. And now, Rand's sitting there looking at something that potentially does at this email from this leaker, claiming to have a copy of a bunch of internal documentation about the Google Search API. And when we hold these leaks up against those statements, Rand started to find some interesting stuff. When I finally got on the phone with him, he pulls up the trove of documents, And my mind exploded. The thing for me is, were the statements made in the same kind of time frame as the documentation that we've received?
Starting point is 00:02:41 The other thing is, why did Rand not just keep these documents private? Somebody gave him the key to the treasure chest. Like, Google's intentionally kept their search algo, a black box, and modifies it, every time somebody gets close to figuring out ways to cheat it. Sure. Creating the industry you speak of with thousands and maybe millions of S-A-R people and companies in the world these days. They even offer their own certifications, which don't tell you how to do it. So it's like if you were given these leaked documents, you would be maybe one of the only people outside of Google that knew how to cheat the system.
Starting point is 00:03:25 And he chose to expose them publicly rather than build an industry out of it. Which I guess kudos to you. Kudos to Rand on that one. To the temporality, you bring up a great point. So did Google. And I think that for the people combing through these thousands and thousands of documents, figuring out that timeline is the rat's nest to untangle right now. As for Rand, using his newfound knowledge,
Starting point is 00:03:54 for personal gain. You know, I didn't ask him about that. Maybe I should have. A lot has happened in the weeks since he got that email, and I wanted to talk to him about it. Google is in the final stages of a big antitrust case with the DOJ.
Starting point is 00:04:10 It concerns Google's search, whether they're engaged in anti-competitive practices. It is not the only case of this kind they have been involved in. Because Google, technically their parent company alphabet, owns websites that compete on the internet. So how Google ranks websites matters not just to all of us, but to them. And that does potentially create a conflict of interest.
Starting point is 00:04:33 There is a rising frustration with particularly Google's self-preferencing behavior. Essentially, Google doing things inside of Google search where they own 95% of the market to benefit other parts of Google. I wanted to know more about all this, about the leak. Google's response, the experience of getting wrapped up in this story. So reached out to Rand. Fun chat. I appreciate his time. This is my conversation with Rand Fishkin, co-founder, SparkTuro, and Snack Bar Studios about the Google API leaks here on Hacked. Rand, thank you so much for sitting down to talk with me about this. Yeah, my pleasure, Jordan. Thanks for having me.
Starting point is 00:05:31 So I think we're the average person, a lot of this can seem in the weeds. For normal internet users, To what extent is the internet shaped by Google's search algorithm? I can actually tell you this exactly because I did some data analysis of clickstream panel data, which essentially, it's kind of like Nielsen TV set boxes from the 1960s, but fast forward to 2024 and collect data about every URL that's visited by tens of millions of devices through a panel. And 70% of all internet traffic is sent by Google. Wow. Pretty brutal.
Starting point is 00:06:12 Brutal. Why do you say that? Well, I am someone who believes that monopoly power tends to stifle innovation, creativity, and opportunity. And I think that Google's stranglehold on internet traffic and on what content people see. see and don't see really limits the what is created. So for content creators, I'm sure you know this world well.
Starting point is 00:06:42 For content creators, you know, if you're in the video game space, what is able to be surfaced on Steam or the Nintendo Switch store or the Xbox store or the PlayStation store, that plays a huge role in what game designers choose to create. Similarly, when you make things. for the internet, whether it's a YouTube video or an article about, you know, poison dart frogs in Central America or the best mustache wax for curly hair, you're going to change what you do based on what Google tells you is important and how you can potentially get traffic to your page. And so, you know, we kind of end up with this Google-shaped internet.
Starting point is 00:07:30 We're talking because there was a trove of leaked documents concerning the search algorithm that shapes that Google-shaped internet. And this leak came into your possession. Take me through that story, starting with, I think, the email that you got. Yeah, I pulled it up on my computer here because I couldn't quite remember exactly how I went. So I got an email from a guy in Georgia, Tbilisi, Georgia, Georgia, the country, not not. Georgia, the state. And I had never emailed with him before. I didn't know who he was. The email was labeled confidential Google. And, you know, he says, Rand, I know you've been out of the SEO industry for a while. SEO is search engine optimization. That's sort of the practice of
Starting point is 00:08:18 ranking web pages in Google and getting traffic to them. And I used to run a company called MAUs, which is in the SEO software and education space. And he says, you were the first person to highlight the influence of click data on search results. From what I've heard, Google went so far as to manually demote your experiments and publicly make statements that are far from truthful, including reputation destruction. There were several Google representatives who over the years said particularly harsh things about my work and research when I was at Maus. And anyway, this email goes on to sort of cite all these examples, and he claims, the emailer claims to have proof, proof of how Google ranks pages, proof that Google has lied publicly,
Starting point is 00:09:06 dozens, if not hundreds of times to news sources of all kinds, proof that potentially they lied to Congress when they, when their CEO, Sondar Pichet, talked about how they use data. Potentially, there's even some suggestion in here that there were lies about the Department of Justice case that was prosecuted last year. These are wild accusations, right? I mean, if you get an email like this, you know, especially I'm six years out from running MAUs, from, you know, being outside the SEO industry,
Starting point is 00:09:46 I kind of look at this and go, well, I have to say this person sounds credible, but also incredibly far-fetched. And so extraordinary claims require extraordinary proof, right? So I write back to this person and sort of say, okay, thanks for telling me all this stuff. Like, what are your goals here? And they say, I think you should be the one to publish this leak data. And I want to show it to you.
Starting point is 00:10:13 So we schedule a phone call. This is how unexcited about it I was, Jordan. It was, I received the email on May 5th, on May 23rd. I canceled the scheduled call with the guy because I wasn't feeling so well. And then we rescheduled for later in May. When I finally got on the phone with him, he pulls up the trove of documents and my mind exploded. I mean, here's essentially what this guy showed me was the API documentation. API is sort of how you make programmatic, right, calls at scale.
Starting point is 00:11:03 It's the API documentation internal to Google's search engineering team. So this is, imagine you and I work on Google search engine. And imagine we're programmers there and we are trying to make Google Web Search even better. These are the list of all the types of data that we can call in order to, build or modify an algorithm, a ranking system, right, to choose which pages appear before which other ones. And not just, by the way, not just Google Web Search. YouTube is in here. Google Android search is in here. Google Maps is in here and local. Google News is in here. All the different flavors of things that Google searches publicly for human beings. And as he showed me these,
Starting point is 00:11:52 we're talking about 2,500 documents. containing 14,000 different attributes or features that you can call, right? So, you know, if somebody says, oh, Google search is simple, it's just X, Y, and Z. You can be like, yeah, yeah, X, Y, and Z. It's 14,000 X, Y's and Zs. Literally, 14,014 X, Y's and Zs are what they used as of March of 2024. So we get off the call, you know, and I'm kind of losing my mind going through this. And the first thing I do is, here, hit up a few people quietly in my network. A few people who used to work in Google as engineers, software engineers.
Starting point is 00:12:35 Three people in particular. The first one that I reached out to said, I don't want to talk about this. And I'm not willing, I'm not willing even anonymously to broach the subject. Okay. But the other two said, yes, I, you know, happy to take a look. They took a look. And then they came back to me and basically said, yeah, this absolutely looks legitimate. I didn't personally have access to this document when I was at Google. You wouldn't have access to this
Starting point is 00:13:02 unless you were on that specific search engineering team. But this is absolutely Google formatted, you know, internal speak throughout. There's almost no way this could have been faked. And then I talked to an expert in search ranking systems who I had known from my time in the industry, a guy named Mike King. Mike runs an agency in New York called I Pull Rank and he and I have been friends for many years. Mike is ludicrously talented, just extraordinarily detailed in his research. He's been working on a book about information retrieval, which is the science of how search engine works, search engines work. And so he's got this absolute plethora of relevant experience around this stuff.
Starting point is 00:13:52 So I show him the leak. I called him. I called him it was a Friday. night. He's out with his kids in Brooklyn at the park. He's like, okay, okay. Wait, what are you telling me? All right, let me just go home and look at this. And I think he, you know, he stayed up all night and most of the weekend working on the leak. And then on Monday night, he and I both published blog posts describing this leak, sharing what was inside them. Obviously, the early analysis was very incomplete, but already there were dozens of features that were extraordinarily interesting, contradicted many statements Google had made in the past. And when we published that, Jordan, the internet exploded.
Starting point is 00:14:35 I mean, hundreds of thousands of visits just to my blog post, I'm sure to Mike's as well. You know, interview requests from two dozen publications, you know, everyone from The Verge, the New Yorker to New York Times and Washington Post and Wall Street Journal and everyone else you can imagine. And Kara Swisher talked about it on her podcast and, you know, it's the top of hacker news. It's the top of tech meme. It was an insane two weeks after that. And since then, you know, people have been analyzing this leak because it's public. Anyone can see it.
Starting point is 00:15:14 You can go right now and look at the 14,000 inputs that make up Google's search ranking algorithm. That had never been possible in the last quarter century. it's mind-blowing. I want to dig into what we learned about the API that we didn't previously know, and maybe to start, because I found this one interesting. Google makes Chrome. Google also sells ads.
Starting point is 00:15:39 Google representatives have long stated that they don't use any information about users in Chrome for ranking, which is very important for selling advertising. And that always seemed kind of shocking to not use this massive trough of privileged data that you could potentially be gathering in your biggest business line over here. But these leaked documents maybe tell a story that that separation of church and state isn't quite so separated. Can you, to start with what's in here, tell me a little bit about that. Yeah, yeah, there's no separation at all. I mean, when you look at how Google measures, for example, one of the things that would happen when any human being performs a search is let's say you are looking for, this
Starting point is 00:16:23 happened to me recently. I was looking up the aquarium of the Pacific, which I think is in Long Beach, California. I wanted to find out how long they were running their frog exhibit. So they've got a new frog exhibit that just launched. It showed up in my Google news feed. I was like, ooh, I want to go see poison dart frogs when I'm down in California over the summer. And so I do this search and I click on the first result, which is about the event. But it doesn't say how long it's running. So then I click back to Google's results. I scroll down a little bit until I find a press mention, right, someone talking about them in the news and saying, oh, the exhibit is planned to be permanent. I click that one and it, okay, great. So I don't have to worry about when it's
Starting point is 00:17:12 going to happen. Here's what Google's documentation says. If you click on a search and then you click the back button and you choose another result, that suggests to Google that the other result is probably more relevant and deserves to be higher up in the rankings than the one you left. You left a result and your search was unsolved and then you bounced back to the search results and chose a different result. And once you went to that one, then your search was resolved. This is called pogo sticking. It has long been used in information Retrieval Literature. And here it is right in the documentation you can observe that Google is not measuring this through Google Analytics, which many people speculated for a long time.
Starting point is 00:18:00 They're not just measuring it by looking at what happens on their search results page. They are looking at the billions of devices that use Chrome, Google Chrome, as their browser, to be able to measure this. And this is only one of hundreds of uses of Chrome data inside the ranking systems. As another example, which I find particularly fascinating at being someone who is in the industry, many people know that one of the ways that you rank higher in Google is get lots of links pointing to you, right? If lots of other pages on the internet link to your page, that tends to suggest to Google that you are more important than someone who has very few links. So inside the leak, you can see that Google uses Chrome.
Starting point is 00:18:48 data, traffic data, to demote or increase the value of links that come from pages that either don't receive or do receive traffic. For example, if you are linked to by an article in The Economist that got a lot of traffic, that link is probably worth much more than a link from, you know, random website.net that gets no traffic. That wasn't always true, by the way. Google used to be very very, manipulatable. Back when I started in the industry, you could get a bunch of links from a bunch of different scammy, sketchy websites and rank nearly anything anywhere you wanted to. But Google managed to find a really clever solution to this using traffic data from Chrome.
Starting point is 00:19:37 Google has long stated, and you gestured towards this, but Google has long stated that they need to balance what information they make available publicly about how search works, just frankly because the more that's public, the more is gameable from everyone from full-blown scammers all the way over to professionals in optimizing for search engines. Talk to me about that balance that we're seeing in these leaks between transparency and kind of making the internet even more broken in certain ways than it already is. Yeah, I'm going to throw out there my suspicion that 10 years ago, maybe 15 years ago, if a document like this had leaked, it would have been quite damaging to Google's ability to organize the web and make its information
Starting point is 00:20:24 useful. I will grant that. And I think that's because Google just wasn't that sophisticated back then, right? The systems for ranking were gamable. They really were. I look at this leak today, And everything I've observed in here suggests to me that Google is nearly bulletproof. Go spam all you want. I don't think you're going to break through. This system is not only sophisticated and elegant, but it is crafted in such a way that in order to game the system, you would have to be really useful to real human beings and a lot of them. If you are, is it really spam anymore?
Starting point is 00:21:11 Right? Like if you don't make things that achieve real popularity that real people link to from their websites, that real news sources talk about and pick up, that get traffic, that once they start ranking, even if you were to game all the other signals, once they start ranking in Google, if it doesn't successfully answer lots of searchers queries, you're going to fall out of the rankings and someone else will rise. So I really, I don't see a downside to Google sharing this. I think if some conspiracy theory 10 years from now is like, oh, actually it wasn't a leak, they put it out there intentionally and I've got the email to prove it, that wouldn't totally shock me because very frankly, this is useful information for not getting scammed by sketchy
Starting point is 00:22:02 SEO providers, but it is not a roadmap that is going to tell you, oh man, if I just put the number seven in my title tag 12 times. You know, I'll rank at the top. There's nothing like that. Sure. There's no like, name your business AAA plumbers because the 3A has come at the start of the telephone book. Yeah, right. It's not the white pages
Starting point is 00:22:24 in 1985. Sure. Exactly. Has Google publicly commented on the documents that were leaked to you? So I did get a private email from a Googler the night I published it that was quite upset.
Starting point is 00:22:39 with one characterization of how I described an event, and I did change it in the post. And then I believe it was the next week, Google's made a public statement through sort of a PR person that the leak was authentic, but they urged people not to misread, you know, potentially incomplete data. And, you know, in fairness, the leak does. reference some of the references in the features do reference other data sources that we can't access. For example, there's a there's a list of a white list of election approved election news providers, right? So that if you were to, you know, it's January 6th, 2020 and you and you,
Starting point is 00:23:31 you know, you're an American and you search for who won the election, you know, was there a dispute? Is there any evidence that the election was problematic? Google wants to make sure that the accurate truth is represented in their results and not someone who, you know, is misrepresenting that. And you can imagine certainly that in the political spectrum, it would not be difficult to replicate all the signals that you might need, including popularity and news references and links and clicks and all that kind of stuff. I have the quote here from the Google spokesperson. It was that we would, quote, caution against making inaccurate assumptions about search based on out of context, outdated, or incomplete information, which, as you said, is importantly not saying these leaks aren't real. What do you read that statement to mean?
Starting point is 00:24:18 Well, I don't think it means anything because it doesn't even say that these documents are outdated or that they're inaccurate. It just says we caution against generally any information that is out of context or outdated. These documents are in context. In fact, all of the features are not all. Many of the features are very well described, such that if you and I were new engineers who joined the Google search team, we could read these documents and be like, oh, okay, I get what that means. That's when I call the Chrome data that tells me what percent of people click the back button after searching for this. And this is the one where they have this thing called squashed and unsquashed clicks. Squashed clicks, you know, they describe it as referring to clicks that they're spam. system thinks are not real human beings and real devices. And so they don't want to count those clicks, that kind of thing. Think about the last time you heard a breach story on this show.
Starting point is 00:25:21 It always starts the same way. Someone somewhere saw something too late. An alert buried, a signal missed, an SOC that just couldn't keep up. Arctic Wolf set out to solve that problem by rebuilding security operations from the ground up for a world where attackers are already using AI. They created the Aurora Super Intelligence Platform, a fully agentic system powered by the swarm of experts. Instead of single-purpose bots or lucky-guess LLMs, this swarm is full of deterministic agents that handle whole entire workflows. Humans stay in the loop and on the loop to validate the critical decisions and keep everything trustworthy, and all of this is just off running on their secure operations graph. A constantly updating intelligence engine fueled by more than 9 trillion telemetry events every week and over a decade of real-world incident response. The system reasons on real signals and real context not synthetic training data.
Starting point is 00:26:12 And the result is the new Aurora Agent SOC. It's the first SOC that is agent led by design. You get agents that coordinate, agents that investigate, agents that respond at machine speed, and hundreds more that automate the repetitive work that normally buries human analysts. Arctic Wolf didn't try and bolt AI onto an old model. They rebuilt the model entirely. What makes it even more effective is how it works with Arctic Wolf's concierge experience. The team brings customer-specific context directly into the platform so every AI-driven decision
Starting point is 00:26:43 reflects your environment instead of generic assumptions. The automation frees your concierge security team to focus on higher value strategy and proactive risk reductions while the agents handle the grind. If you want to see what trustworthy, production-ready AI and security operations actually looks like, go to arcticwolf.com slash hacked. Never feel like cyber threats are evolving faster than anyone can keep up? Last year, 2025 was nothing short of a record-breaking year for major breaches, from sophisticated ransomware operators to AI-enabled attacks that turned defenses on their head. Organizations around the world saw headlines they never expected,
Starting point is 00:27:20 and cybersecurity teams were tested like never before. But here's the thing. These incidents aren't just news headlines. They're learning opportunities. And that's why Arctic Wolf is hosting a live webinar on February 5th, diving to the most impactful breaches of 2025. Their field CTO and security leaders are going to unpack not just, what happened, but why these attacks succeeded, and most importantly, what businesses can do to
Starting point is 00:27:42 fortify their defenses for it's too late. You're going to walk away with real insights into how threat actors are evolving, how defenders are responding, and what strategies can help you stay ahead of the next big breach. It's not fear-mongering. It's practical, actionable, intelligence from experts in the trenches. Register now at arcticwolf.com slash hacked. Is there anything else we learned about Google search from these documents that we haven't talked about? We talked about Chrome. We talked about the PogoStick stuff. Is there anything else we learned in these documents that the average internet user might want to know about? Ooh, average internet user is a good qualifier there. I think there are a tremendous number of things that should be extremely
Starting point is 00:28:27 interesting to anyone who creates or publishes content on the internet and wants that content to do well. The number of things that apply to you if that's not who you are are limited. But I will say one of those things that I think folks should probably keep in mind and be aware of is that when Google's public representatives make statements about how Google works and those get quoted potentially uncritically in the press, this document suggests that was probably a mistake, right? that Google's public statements about, especially about how search works and what they care about and what's important and, you know, what will affect your rankings and won't, you probably should take those with a grain of salt. Because this documentation suggests that somewhere between dozens and hundreds of times in the last 20 years, Google has been directly misleading or, you know, straight up lying about those things. And I, in my blog post, I urged
Starting point is 00:29:32 especially industry commentators, you know, podcasts like yours, folks like Keras Swishu covered at the verge, search engine land, right, all of these publications. I urge them to take a critical view of statements that are made publicly by Googlers because they are in the best interest of Google potentially, but they're not always accurate and sometimes directly, provably wrong. And I think we should treat them a little bit more like we treat statements from politicians. right? I think the job of a journalist is don't tell me that, you know, Jordan said it's raining and Rand said it's sunny outside. Like, go outside and tell me what the weather is.
Starting point is 00:30:15 Since you brought it up, what would that former group, people who regularly publish content in the internet? What's the headline for them? Oh, God. The headline for them is you should probably follow and pay close attention to people who are studying and analyzing and extracting value from this because there are hundreds of takeaways that are both actionable and probably different to things you've done in the past or learned as best practices. Gosh, just today, Cyrus Shepard, who I used to work with at Maus and who's an expert in search, he was actually one of Google's quality raters for a couple of years, which is quite interesting. But he noted that there was a finding from another party inside the Google leak
Starting point is 00:31:06 that content, something called content effort is scored inside the Google leak. It's a factor that essentially human quality raiders, as they visit websites, you know, there's these thousands, tens of thousands of people who work for Google through a contractor, they visit websites, and they're supposed to write about them, right, and sort of score them and say whether they're good or bad and all sorts of features about them. And one of the things they're asked to do is say, did it look like a human being spent a lot of effort manually to create something uniquely valuable, right? Differentiated and valuable to people with this resource. And that score now appears to be using a large language model AI. So it basically takes the input from all these quality
Starting point is 00:31:54 Raiders, builds a metric, you know, a sort of algorithm. And now it's scored through an AI system. And I found that totally fascinating, right? Essentially, you've scaled up what quality raters used to do manually and done it with an AI. And that's being used, according to the documents, in the ranking system. Wow. I want to talk about the AI thing, because I think there's something really important there. But the anonymous source, just to go back to that, since then, someone has come forward saying that they are that anonymous source. Erfana Zimi, I believe, is their name. Yes. Yeah, so Erfan decided, I think, was only about two or three days after the leak was published. I think he sort of saw that the reception was generally favorable and not attacking,
Starting point is 00:32:43 right? Not sort of critical of the source or of the credibility of the data. And he came forward as the anonymous leaker. And he's since, you know, done a few interviews and talk publicly about it. He's a real interesting guy. We don't agree on 100% of things. But Jordan, I actually find him to be quite a lovely human being. He's got a sweet and empathetic and sensitive side. And I think a really strong sense of justice, too. What's your sense of why he chose to leak these documents? Because as you said, it could have gone either way. The public reception could have been one thing. Google's reception could have been another thing. Why do you think he chose to leak these?
Starting point is 00:33:24 To be honest, I think his stated reasons are accurate. I've seen nothing in his behavior before or since to suggest that he had anything other than a deep frustration and anger that Google had misled people in the field of content creation and the field of marketing and the technology world and press overall, but potentially even some of the legal cases against Google. I think he wanted the record to be set straight. And I think he felt an obligation to make this data available to people. That's something I share, right?
Starting point is 00:34:07 For 17 years of my life, you know, my whole mission was, how do I make search more transparent? That was Maas' whole goal. I think that's what built that company was, making this dark underbelly, you know, when I started in SEO, Jordan, it was seen as a scam. Like people thought that everyone in SEO was just sketchy and terrible and, you know, that they were manipulative. And it took years, took decades to kind of make it a mainstream marketing practice that every company now invests in, right? Almost every company in the world has someone
Starting point is 00:34:44 who thinks about or works on SEO. It's no longer seen. sketchy or spammy, it employs millions of people worldwide, obviously because Google sends 70% of the internet's traffic, right, outgoing click traffic. And so this is kind of a, it's a little bit of a dream come true to be able to share this, even though I'm out of the field. And I think for Erfan, being still in the field, he really wanted people to know the truth. You were talking earlier about the usefulness of Google, and I'm fascinated by this, the sort of the rise and fall of how we find information. I think there's a feeling amongst a lot of people right now that Google is becoming increasingly less useful for finding authentic human-created information,
Starting point is 00:35:34 or it can be in certain use cases. If you want something written by a person without any commercial sort of motivation behind it, basically Google is just a tool for appending Reddit to your search post. If you want to find some good, What did these leaks tell us about the usefulness of Google search for people looking to do more than just window shop on the mall that the internet is becoming? Ooh, you know, to be honest, I'm not sure that the leak reveals anything in that particular direction. Instead, I would say that what you'd want to look for in these cases is when people, statistically speaking, you know, let's take a panel of tens of millions of people like the Dados panel. And let's look at, you know, how many searches did they do over the last two years each month? And was that number growing or shrinking?
Starting point is 00:36:28 And when they do searches, where do they go after they search? Do they stay on Google? Do they click on a paid ad? Do they click on what's called the organic results, right? The SEO results, which are unpaid. Do they click onto something that Google owns, like YouTube or Google Maps or Google Flights, Google Hotels, Google Finance, right? All those kinds of things.
Starting point is 00:36:52 And that, to me, really gets at the core of this question. Is Google less useful? What's it a tool for? And to be honest, I say these things because I have this data. I'm actually looking at some of it right here in Excel. And my read is Google has not, Google's personal. perceived usefulness to many people, especially some influential people in sort of tech and journalism, has probably shrunk. But we are not using it less. We as human beings continue to use Google
Starting point is 00:37:34 more and more. You could argue, I'm just, I just use Google to find things on Reddit. Maybe you do. Maybe you just use Google to find things on Amazon. I do that too. Maybe you just use Google to find this thing that your friend mentioned and look up how old Tom Cruise is and how young is the girl he's dating and, you know, like, okay, that's fine. But you're using it. You're using it more and more. And so I find it hard to believe that Google is actually becoming this, this non-useful thing. Certainly their dominance in terms of market share has not been challenged. A lot of people said, for example, that open AI and chat GPT, those were going to take
Starting point is 00:38:14 tons of market share away from Google. I use perplexity now to do all my search shares or whatever it is. Maybe you do, but you're in a very, very tiny group, my friend. You know, if you look at the stats, it's just not true, right? Bing has made negligible, you know, less than 1% market share progress since sort of the rollout of generative AI. That's interesting. You're this close to having a Google marketing campaigns showing that these leaks prove just how useful we are. I mean, you know, what the leaks definitely prove.
Starting point is 00:38:50 is these are very smart, talented, capable people who've built a robust system that is incredibly hard to game or manipulate. And I don't think this leak is going to hurt them at all. I do think it's going to help a lot of people who previously speculated wrongly about how Google worked and it's going to help a lot of people who maybe hired the services. You know, they've got a small business, they've got a consultancy, they have a restaurant, they have an art studio, and they want to do well in Google, but they don't know exactly how, and so they hire someone who claims to do these things, right? You can now verify. Do they know what they're talking about? When I read the conversation around these leaks and I look at all these things and someone tells me, oh yeah,
Starting point is 00:39:42 we're going to do X, Y, and Z, and you go, but I don't think Google uses X, Y, and Z. Or if they do use X, Y, and Z, they control for it with A, B, and C. Are you going to do those things, too? So I think this allows for a lot more sophistication from small business owners, people who would hire the services of marketers and marketers themselves. So it seems like what this story is really about is about that tension between what Google has long said about how search worked and what we've seemingly learned about how search works. And that really matters because they drive 70% of internet traffic. and Google is changing again. There's a lot of questions about what's going to happen to the, you know,
Starting point is 00:40:24 ecosystem that is the internet as we start having large language model generated, you know, summaries of content. And they're saying things about how that's going to work and the impact it's going to have. And we've just learned something about what it means when they say something. As we look to where Google's going, you know, what do these tell us? Well, I, so I look at this a little bit. holistically and one of the things that I see is you know in the Department of Justice
Starting point is 00:40:54 case in the FTC cases in the congressional testimony in some of the EU court cases you get the sense that there is a rising frustration with particularly Google's self-preferencing behavior essentially Google doing things things inside of Google search where they own 95% of the market to benefit other parts of Google. And I, you know, Jordan, I like to go back to the original antitrust laws in the United States, right, which I think we're from the 1880s or 1890s when my understanding was U.S. steel was the only steel that the railroad would carry. And so as a result, they owned a monopoly on the steel trade in the United States.
Starting point is 00:41:51 And the government said, you know what? We need laws around this. This should be illegal. You shouldn't be able to use your ownership of a railroad and of rail lines to control whose steel can make it from one part of the, you know, from whatever, the mine to the port. And so they created antitrust laws. And if they were fairly applied, I believe that. that Google would not be able to do what Google does today, which is to say when you search Google,
Starting point is 00:42:24 a quarter of all the traffic we send is gonna be to our own other properties. It's gonna go to YouTube, not your video hosting provider. It's not gonna go to Vimeo, it's not gonna go to Wistia, it's not gonna go to CBS News' in-house video, it's going to YouTube because we make money that way. Same thing with Google Maps. It's not going directly to the,
Starting point is 00:42:47 local business, it's going to Google Maps and Local, which makes sure that you download that application on your phone so we can track everywhere that you go and you leave reviews for Google Maps so that the Google reviews can win out over Yelp or TripAdvisor or any of these other players. When you search for stock ticker symbols, we're sending you to Google Finance. When you search for news, we're going to send you to Google News. When you search for flights, we're going to put the Google Flight box at the very top so that all the airlines need to pay Google flights, not the OTAs. So, you know, this is just on and on. There's probably two dozen examples I haven't named. And in all of these cases, that sounds a hell
Starting point is 00:43:26 of a lot to me, like using your railroad to only transport your own steel. Just saying. Where can people find the leak if they want to dig into it further? The leak is currently hosted on a platform called Hextox, which copies code from GitHub. If you want to find the specific one. You can go to my blog post or Mike's blog post. So if you search for Google Leak and look for Spark Toro or iPol rank, you will find the link to the documents themselves if you want to, if you're technical enough to read API documentation. And there's lots of analyses out there that I urge folks to check out. Amazing. Well, thank you again. It's been fascinating. My pleasure. Thanks for having me, Jordan.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.