Although it has exhibited explosive growth and extensively impacted the worlds of information and commerce, the globally accessible computer network known as the Internet has effectively become an unstructured victim of itself. Internet information usage has largely lost its utility because traditional search engines can neither access the vast available information pool nor qualify it adequately. The best present search engine can keep track of and access only a small fraction of Internet World Wide Web pages (i.e., about one billion of 550 billion available documents). The accessible sites are categorized in rudimentary fashion using key words rather than intelligent assessment of content. A current common result of searches for information, even limited to the small fraction of the available information, is thousands, and often millions, of irrelevant responses.
Information collection and distribution on the Internet take place as follows. A conventional Internet search engine uses software (called “spiders”) that roams the Web to gather information, which is distilled, indexed, and cataloged in a central database. An Internet search conducted by a Web user of that search engine produces results that come from the database, not from the Internet itself. The results produced are references to Internet addresses, thereby requiring the Web user to open multiple sites in search of the information.
Current search engines do not include an ability to mass-search all sites and retrieve and organize the search results by content; therefore, searches are applied to all accessible information, irrespective of whether it is relevant. The result is a largely ineffective search engine effort and non-responsive returns on search queries. Examples of such traditional search engines include Northern Light™, Snap™, Alta Vista™, HotBot™, Microsoft™, Infoseek™, Google™, Yahoo™, Excite™, Lycos™, and Euroseek™.
The conventional search technology is, therefore, based on a model in which the indexes, references, and actual data (in the case of commerce networks) are centralized. All queries take place at central sites, and the data distributed are not updated in real time (and are typically stale) and usually require reformatting. The Internet is at best a frustrating search environment because the data reside in multiple formats and in a distributed world.
For applications in commerce, the existing Internet architecture can accommodate only a small fraction of the business participation that would otherwise be available to produce consumer benefits arising from competition. The Internet as a consequence effectively serves only the large dominant players, while effectively excluding everyone else. Part of the e-commerce perception is that virtually anything can be purchased over the Internet. While the perception is accurate, it ignores the fact that bias in the current system locks out a much greater part of the marketplace than it serves. Business to business commercial utilization of the Internet consists largely of e-mail communications.
For applications in delivery of services, particularly as various governmental entities have attempted to use the Internet, the lack of sensible structure is especially notable. These situations do not exist through the fault or incompetence of users but again stem from an inherent and systemic limitation of the “centralized” Internet.
The efforts of traditional search sites to retain and attract more consumer attention and thereby generate more advertising revenue have caused the attempt to centralize all online information to rise to the point of conflict. As stated above, the growth in the volume and the diversity of Internet content now lead to searches generating thousands of pages of results that encompass only a fraction of the overall body of relevant information. The market needs access to additional organizational structures, but the current system makes these requirements impossible to meet. Traditional search sites are designed and predicted to lead to further centralization, which will exacerbate the information accessibility problem.
Conventional wisdom has been that speed can offset the growth of Internet information. The industry emphasis has been on hardware improvements rather than next generation software. Five years ago, a state of the art personal computer used a 166 MHZ microprocessor chip. Currently, 800 MHZ microprocessor chips are standard, and 1,000 MHZ microprocessor chips are expected to be available soon. Ironically, while currently available machines can search for information much more quickly, they also create information at a rate consistent with their speed. They are in effect helping the problem keep pace with the solution. Insofar as emphasis has been placed on software, it has been to improve applications within the current architecture or to offer and market e-commerce alternatives within the current architecture. As a consequence, all such efforts are impeded before they begin.
Because of the sheer size of the Internet and the spiders operate from a central location, the spiders can cover only a small fraction of the entire Internet. The resulting database of search results is inherently limited not only in size but also in freshness. The required tradeoffs are self-defeating. Making the database broader and deeper would require excessive “roaming” time so that the information would become stale. Keeping the information fresh would require searching a smaller fraction of the available Internet documents, thereby making the results less comprehensive.
Total information is now growing at an exponential rate. Most of the new information winds up in the inaccessible category. There is no assurance that updated information will “bump” outdated information from the accessible information pool. The average age of newly returned World Wide Web links is 186 days. The milieu is frequently one of old information, insufficient information, disorganized information and, in short, unmanageable information. There is a pressing need, therefore, to fold the existing Internet into a new world of efficient organization that will competently manage future generations of growth.