Command Line Heroes - Looking for Search

Episode Date: June 1, 2021

The web was growing quickly in the ‘90s. But all that growth wasn’t going to lead to much if people couldn’t actually find any web sites. In 1995, an innovative new tool started crawling the web.... And the search engine it fed opened the doors to the World Wide Web. Elizabeth Van Couvering describes trying to find websites before search engines, and how difficult it was becoming in the early ’90s to keep track of them all. Louis Monier talks about having to convince others how important search engines would become—and he showed them what a web crawler could do. Paul Cormier recounts taking the search engine from a research project to a commercial one. And Richard Seltzer wrote the book on search engines, helping the rest of the world see what a profoundly vital tool they would become. If you want to read up on some of our research on search, you can check out all our bonus material over at redhat.com/commandlineheroes. The page is built in the style of 1995—check it out.Follow along with the episode transcript.

Transcript
Discussion (0)
Starting point is 00:00:00 It's the winter of 1995. You're an early adopter cruising the Internet for all the latest details on, I don't know, Pixar's new movie, Toy Story. Or maybe NASA's Galileo space probe. The World Wide Web is your oyster because a brand new tech has arrived that lets you search for anything online. It's unlike anything that came before. There's a flood of information at your fingertips. And now you can navigate it with total ease. This website for searching other websites is going to change the world.
Starting point is 00:00:42 It's got a weird name, but who cares? You type it in and smile. AltaVista.com. All season, we've been looking into the tech breakthroughs that made 1995 one of the most extraordinary years on record. It was the year the dot-com bubble began and 16 million internet users suddenly showed up. With their arrival came a flood of new content. And with that flood came a pressing new need, the need for navigation. How do you follow every thread in a worldwide web? How do you find that crucial piece of information you've been hunting for? At each point in history, we've had to invent new ways to organize our data. The more data we got, the more creative we had to get. In ancient
Starting point is 00:01:33 times, we invented alphabetical order, indexes, and tables of contents so we could find what we needed in books. More modern inventions, like the Dewey Decimal System, organized huge libraries of knowledge. But digging through something as giant as the World Wide Web required an invention more powerful than anything that had come before. And 1995 was the year that invention arrived. I'm Saran Yitbarek, and this is Command Line Heroes, an original podcast from Red Hat. As of the year 2020, there were close to 5 billion internet users on the planet.
Starting point is 00:02:18 5 billion people posting content, tagging themselves, streaming videos, writing posts, and researching term papers. To get it all done, they searched through more than 4 billion web pages. That's a lot of content. For perspective, the New York Public Library system has just 55 million items. Our ability to sort through all that content isn't just convenient, it's fundamental. Without a way to search the web, the web as we know it would not work. In fact, before 1995, there really wasn't
Starting point is 00:02:53 a good way to search. And I actually even have a book that I used to write down websites in so that I could help people at the cyber cafe find things that they wanted. Elizabeth Van Couvering is an assistant professor in media and communications at the University of Karlstadt. She's been studying the history of search engines for about as long as they've been around. And she describes the early web as a flat and static place with a bunch of directories to help you find your way. The first directory was made in 1991 by Tim Berners-Lee himself. He called it the Virtual Library. Sounds impressive, but it was honestly just a list of sites. New websites were sent to Berners-Lee, and he'd post links on his library page, organizing them into categories like anthropology and biosciences.
Starting point is 00:03:46 He was getting, at most, 100 visits a day. But early web directories like that? They were updated manually. In Yahoo's early state, there were editors who kept a list. There was also InfoSeek, where webmasters manually submitted their pages for inclusion. But web search as we know it hadn't yet evolved. It soon became apparent that the work was not very possible, actually. It simply wasn't possible to do it on a human scale because you just needed too many people and there still was no funding.
Starting point is 00:04:27 Vancouvering's point about funding is an important one. In the early days, most people didn't understand what search was going to become. So there wasn't much money behind the idea. The internet was a niche, not quite a global phenomenon. And the idea you could get enough traffic at a search site to make money through ads, which maybe seems obvious today, was not so obvious back then. That early site Infoseek, for example, they tried a subscription model selling access to their list of websites for 10 bucks a month. And you were limited to 100 searches each month. These companies were searching all the time for what was going to be the business model.
Starting point is 00:05:16 That problem, finding the right business model for search, was going to take a long time to solve. We'll get back to it later. But in the meantime, there was another problem. How do you get a technology that's useful enough, that's good enough, to actually sell? Too much demand and not enough supply. That's Louis Monnier, a true web pioneer. He told us how, back in the early 90s, only a tiny fraction of the internet was indexed by anybody. On top of that, nobody had a handle on how fast this thing was growing.
Starting point is 00:05:46 Partly, it was a hardware problem. Machines were underperforming, and people were getting timeout responses when they tried to search. As a user of the internet, I was frustrated with what was available. At the time, Monnier worked in a research lab at Digital Equipment Corporation, better known as DEC, along with computer scientists Paul Flaherty and Michael Burroughs. The team had seen the Mosaic browser, which arrived in 1994. They'd seen the easiness of Mosaic, its usability, and it inspired them to start thinking about other tools that could make the web more usable. A search tool was front of mind.
Starting point is 00:06:25 It was not, though, front of mind for all their bosses at DEC. The management was sort of confused for a while because this didn't look like anything they knew. So there were a few people who were sufficiently interested in change that they understood what it was and they took the bet that this was actually going to turn big. But most people were just really confused. What is it? How are you going to sell it? Who's going to use it? Why do people care?
Starting point is 00:07:00 One of the DEC folks who did believe that search was worth pursuing was a DEC product engineer named Paul Cormier. His job was to help promote the new creations that came out of their research labs. Digital was looking for ways to commercialize some of their more interesting research technologies. And so way back then, they started an organization to look at trying to productize some of these interesting research projects. So actually embedding product engineers into the research groups to try to look at promising ones that we would bring to market. And I was the first one that started that group. At the time, DEC was the second largest computer company in the world after IBM. And like IBM, they prided themselves on funding their own research.
Starting point is 00:07:52 They had four major labs like the one where Monnier worked. Digital had a huge research organization. The researchers could really do anything that they thought was interesting or what their background was. And so the researchers in Palo Alto started search because it was just interesting at the time. Search was interesting because it hadn't yet been done in a big enough way. And the team at DEC came to realize they had a chance to deliver something new. A first real search engine for everyday users. They called it AltaVista.
Starting point is 00:08:38 I remember being like in the spring of 95, just lying on a mattress by the pool and taking notes about, you know, how to design something to do that. And then by the 4th of July, 1995, I was starting to crawl the Internet. Did you catch that term Marnier used? Crawl the Internet? A web crawler, or a spider they're sometimes called, is a program that crawls over the web and indexes everything it finds. You start on, say, the Yahoo homepage. Your program analyzes what's there, finds all the links, and then follows those links. On the new web page, you repeat. From link to link, you crawl the whole world's wide web. Or that's the idea. Trouble is, all the crawlers back in 1995
Starting point is 00:09:29 couldn't keep up with how fast the web was expanding. Most web crawlers were what's called single-fledged, meaning they were doing things one after the other. So fetch a web page, analyze the links, note those links somewhere, pick the next link, fetch that page, keep going. But if you're crawling to one link at a time,
Starting point is 00:09:53 you can only do maybe 20,000 sites in a day. So when Mon-Yeg built his crawler, it wasn't going to be single-threaded. I wrote a web crawler, which didn't do one thing at a time. It did literally 1,000 things at a time. It had 1,000 threads. So think of them as little programs that work next to each other independently.
Starting point is 00:10:19 And each would fetch a page and then go to sleep. And let someone else use the computer while waiting for the answer. And that worked really well. So it was basically working a thousand times faster than other web crawlers. That crawler, which they called Scooter, was one of several advances built into AltaVista. It was the first search engine to index the entire text of each web page. The first to allow searches of images, audio, and video. Two years after launch, it even had a translation feature called Babelfish, which massively increased the number of pages you could make sense of.
Starting point is 00:10:58 All these features were being utilized to hunt through hundreds of millions of pages. And thanks to DEC's line of alpha 64-bit microprocessors, Alphavista could work at speeds that made searching feel as easy as thinking. So it would spawn in a fraction of a second, which people had just never experienced. By the fall of 1995, Monnier had released an internal version of the search engine to the staff at DEC, and they seemed to like it. I had no idea how much people were starting to rely on it. Until one night, I turned off the machine in order to, I don't know, reload a new version
Starting point is 00:11:36 or something like this, and my inbox just explodes with messages of people saying, you know, turn it back on. I can't do without this. Louis Monnier's team had taken the idea of search and turned it into something indispensable. Next, AltaVista became available to the public in December 1995. And within a couple years, there were 80 million users. AltaVista probably accelerated the development of modern search engines by a few years. It was sort of worldwide. Everybody was using AltaVista probably accelerated the development of modern search engines by a few years. It was sort of worldwide. Everybody was using AltaVista. So I guess that's the thing I'm the most proud of, you know, by pushing the envelope a lot, not a little bit. Richard Seltzer is a novelist these days, but he worked at DEC for 19 years. When Alta Vista was first released,
Starting point is 00:12:27 it was Seltzer they called on to write a book explaining the value of search. Part of the fun I had in writing the book and in making speeches about it and everything was pointing out to people
Starting point is 00:12:37 the capabilities they had in their hands and they had no idea that it was there. His book was called The Alta Vista Search Revolution. And it was a revolution they were talking about. While researching his book, Seltzer began to see
Starting point is 00:12:53 something profound in what Monnier had done. The switch from old databases to jet-fueled indexes was a game changer. Before Alta Vista came along, if you wanted to find something, you had to categorize it and put it neatly into a database before you'd ever be able to get
Starting point is 00:13:12 anything out. The problem with databases, that database is being based on categories, and categories being based on what you know of the world today and how you think of the world today. But the problem is the world doesn't stay still. The older database approach only worked if you knew how things were organized in the first place. But the index approach allowed the automated retrieval of whatever you wanted. And that meant everything was yours to discover. It was being able to find the needle in the haystack without touching the haystack.
Starting point is 00:13:52 Seltzer's book opened a door for people, revealing not just the power of search, but the flip side of that coin, the power of search engine optimization. Of course, they didn't have that term yet. He called it... Flypaper. See what he did there?
Starting point is 00:14:09 Because flies get caught in webs, but sticky flypaper catches even more flies? Okay, maybe SEO is better. But Seltzer deserves credit for seeing that search was going to be about more than just ordinary users. It would be about businesses too. Sometimes the best way to find that unique person or set of people out there isn't to search for
Starting point is 00:14:32 them, but rather to set up web pages that lead to them finding you through their searches. Not everybody at DEC saw that search had a business angle. In fact, they didn't know what they had. They didn't quite get it. At heart, it was a hardware company with an image of technology that was rooted in the past. And that meant, amazing as AltaVista was, it never had a long horizon. Being a leader in the field doesn't mean much if you don't know what you're selling. So there was no more interest in this,
Starting point is 00:15:10 and we sort of run out of steam. And that literally is when Google started. Elizabeth Van Couvering told us a series of business failures caused AltaVista to sink after a few years at the top of the search pack. First and foremost, they failed to monetize, while others were not so shy. In about 1996, Yahoo began to take advertising and everything became different. People really, really, really needed a funding model. By this point, Yahoo had outgrown their old directory days. They were indexing the web, just like AltaVista, and they'd found a way to make money at it too.
Starting point is 00:16:01 Then in 1997, a series of mergers took place. Big players came to realize that search was the future. Some of the biggest companies in media were swallowing up the search startups. Disney, for example, bought Infoseek. And who bought AltaVista? Well, their parent company, DEC, was bought by the computer company, Compaq. Sadly, Compaq didn't understand what they'd bought. The thing that business people at that time understood about search is that it got people to one place. And then they did not understand that search is a transit point. So they wanted to keep everybody in the search engine so they could show them ads in the search engine. What Vancouvering is describing is the web portal model that Yahoo
Starting point is 00:16:54 and AltaVista both tried out. The vision, a failed vision as it turned out, was to build a walled garden, a complex homepage, a place that attracted people and held on to them so advertisers could display their wares. That's not a business model that works for search engines. Search engines need people to come and go, right? A search engine, to be good, has to cover the whole web, not just the part of the web where you can show advertisements. And so Compaq had
Starting point is 00:17:26 signed AltaVista's death warrant. They didn't have any idea how to make this into a real venture, and they didn't want to invest in it. And the clever engineers, the research people of DEC couldn't help them understand because they also didn't know the business model. But there wasn't any more research funding coming because Compaq, unlike DEC, was not a research-focused company. At this point in the late 90s, nobody knew for sure how search was going to be monetized. Marketing teams, for example, were looking to tweak results for their clients. They were poking at AltaVista and they were poking at Lycos and they were poking at Infoseek.
Starting point is 00:18:10 We now had pressures to potentially make some people look good on results or how can we give people what they want or should we sell links? And there was a pressure to include paid-for links in the index, and that became a very conflicted space. Vancouvering's research suggests that the tech of AltaVista was perfectly positioned to succeed, but the executives in charge were not. AltaVista had a technically superior search engine, but they did not manage it correctly for the time. In other words, while there was more and more interest in search, the field was anybody's for the taking. AltaVista could have made a comeback, but that opening was about to snap shut. When Larry Page and Sergey Brin founded Google in 1998, they weren't thinking about what
Starting point is 00:19:08 the marketing department wanted or how they could support a hardware team. They were thinking about the way academics organize their information. Citation analysis, a way to see which material is more cited, more relevant. And they built that approach into their PageRank algorithm. What Bryn and Page did was to take that kind of logic and say, okay, we are going to look at not just the content of the page. AltaVista and the other search engines up to that time had looked at what was on the page. You know, what was the title of the page, what were the words on the page. But Google looked at the link structure behind the page and identified certain websites as hubs of knowledge. The indexing was still organic and
Starting point is 00:20:03 free in one sense, but it also tracked the value of different pages. It ranked them. This brought an elegance to search and a usefulness that it never had before. And Google brought something else too, dollar signs. Page rank aside, Cormier reminded us that Google cracked the funding problem AltaVista and the rest never quite figured out. Finding the business model, the advertising-based business model, was the thing that made it explode. And as people used it more for free, you needed more everything. More R&D, more systems, more software, more data centers, everything. And so I think once the advertising model that Google brilliantly put in place, that's really what it took for it to really become what it is today, because it's a huge capital investment and somehow you have to pay for it.
Starting point is 00:21:01 It's really easy to say now, what a dumb mistake to not come up with an advertising model. But when it's never been done before, it's different, right? There was a moment that you'll remember if you're old enough. That moment in the early 2000s, when somebody said, hey, you're using the wrong search engine. And then they pointed you toward Google. Converts raced to the new king of search. And soon enough, it became literally synonymous with search. You didn't just search anymore. You Googled. Nobody was Alta Vista-ing.
Starting point is 00:21:40 Those early years of competition faded into memory. Here's Vancouvering one more time. AltaVista showed that you needed to be pushing the boundaries, to be cutting edge with your technology, but they also showed the power of the business model that they did not have, that they couldn't have within the confines of that research environment. They were not able to make that work and they couldn't get out of DEC and they
Starting point is 00:22:13 couldn't get out of Compaq. Search was always going to be so much more than a side project, so much more than a curiosity in a computer company's research lab. And after Google, we all understood that. In a massively connected world flooded with content, search is what everybody's looking for. In 2010, Google Instant arrived and the search engine started predicting what you were searching for before you finished typing. Advances in AI continue to make the experience more and more intuitive. And it's really become impossible to imagine online life without these tools. But you know, the fundamentals of today's search technology
Starting point is 00:23:06 were there in AltaVista. They were imagined into being back in 1995 by pioneers like Louis Marnier. He understood that no simple catalog could ever keep up with our sprawling curiosity. We probably used search engines a few hundred times just while putting this episode together. I mean, just today, I wanted to check up on our guests. And in a second, I see Louis Monnier just did an interview where they call him the father of search. And Larry Page's net worth is, I'm not even going to tell you. And Paul Cormier is, all right, he's the CEO of Red Hat,
Starting point is 00:23:49 the company that brings you this podcast. Want to learn more about the history of search engines? There's a stack of bonus material waiting for you over at redhat.com slash command line heroes. Next time, we're going global, Thank you. And this is Command Line Heroes, an original podcast from Red Hat. Keep on coding. Hi, I'm Jeff Ligon. I'm the Director of Engineering for Edge and Automotive at Red Hat. Even 10 years ago, the chaos of running hundreds and thousands of containers in a cluster,
Starting point is 00:24:40 it didn't feel like you could go from that to running just dozens in a car. But these days, it's coming. In fact, containers are a big part of the future vision of software-defined vehicles. And look, if we can get the container revolution to work in cars, then everything a cloud-native developer can do today can apply to cars. This huge ecosystem of engineers
Starting point is 00:24:59 can start to write applications for automotive. We can completely change the industry. This is why Red Hat's open-source approach to edge computing is so important. can start to write applications for automotive. We can completely change the industry. This is why Red Hat's open-source approach to edge computing is so important. The way we collaborate, the way we build together, it's already making some pretty incredible things possible. Learn more about them at redhat.com slash edge.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.