Originally published on July 27, 2000 and brought to you today as a time capsule.
The size of the Internet is much more immense than previously believed, and the Web is expanding so fast that even the best search engines can only scratch the surface, according to a new study.
The study was conducted by BrightPlanet, a South Dakota-based startup company. According to its findings, there is a “vast reservoir of Internet content,” hidden deep in the recesses of the World Wide Web. “Searching on the Internet today can be like dragging a net across the surface of the ocean,” said researchers.
BrightPlanet said the volume of these deep pockets of information is 500 times bigger than the “known surface” of the World Wide Web. Earlier studies of the Internet have estimated there are about a billion documents on the Internet accessible to the public, but the new study says the number exceeds 550 billion.
Billions and Billions
According to the study, “There are literally hundreds of billions of highly valuable documents hidden in searchable databases that cannot be retrieved by conventional search engines.”
Experts may dispute the study’s findings when it comes to the numbers, but it is generally agreed that search technology has not kept pace with the frenetic expansion of the Internet.
Full text search engines get their listings in two ways. Site developers can submit addresses to an engine for indexing, or the engine can use “spiders,” which depend on links from existing sites to discover new ones.
However, there is a great deal of specific information on the Internet — much of it valuable to researchers and other scholars — that may have few, if any, links. Without such links, search engines find new sites only by pure chance.
Also, governments, universities and corporations are storing more and more information in huge databases on “dynamic” pages that cannot be accessed using conventional search engines, which only identify “static” pages. Conventional search engines can lead users to the databases, but further queries are usually required before more information can be called up.
Because most government and academic organizations do not operate like businesses — which seek out listings on as many search engines as possible — a great deal of information in the Web’s “dark corners” remains undiscovered, the study says.
Industry in Flux
Internet searching is a highly competitive industry, with new engines popping up every day and others folding, merging or being swallowed by bigger ones. Estimates as to the number of search engines run as high as 1,800. Experts agree there is a vast amount of material these engines never reach.
Recent studies from L&G and the Digital Systems Research Center found that no search engine indexes more than one-third to one-half of the publicly available documents on the Internet. According to those studies, there are an estimated 320 million documents on the Internet, but the researchers said the “true size” of the Web is almost certainly much larger.
The BrightPlanet findings show that more than 100,000 “deep” Web sites presently exist, and that they tend to be narrower and contain more content.