The Internet is a vast, global network of countless computers, networks, routers and data lines. It was created for the U.S. Department of Defense (DoD) in the 1970's. The Department of Defense needed to establish a research network to link computers in universities, research labs and government centers across the country. The DoD network was opened to the public in the 1980's when the National Science Foundation (NSF) established its own network, the NSFNET, based on the existing network structure. Administration of the backbone structure for the Internet and domain name registrations was eventually transferred to private companies, as the Internet was opened to commercial usage in the 1990's.
Since 1995, the growth of the Internet has been phenomenal. The Internet connects users with the plethora of sites on the network having information content principally by a system of site addressing using Universal Resource Locators (URLs), known as the World Wide Web (WWW). As the number of sites have grown exponentially, search services have arisen as the key entry points to the Internet for the millions of users searching for content among hundreds of millions of sites on the Web. The number of search services has expanded from a handful in 1995 to over 500 in 1998.
Search services distinguish themselves by the extent of sites that they have indexed and by the efficiency with which they can find and list relevant sites for a user in response to a search query. There are two general types of search methodologies that have evolved: the index or Boolean search, and the category or directory search.
The index or Boolean search allows the user to enter one or more keywords, which may be qualified by Boolean operators, in order to locate relevant content by matching the keywords with those appearing in the content. Because the total data volume of content is prohibitively large, search services will maintain listings of summaries of content provided by the content providers themselves and/or will generate abstracts of content using automated "spiders" or "robots" which systematically search through the Internet for content. The latter type of utility program is designed to jump from one Internet site address to another collecting information on the data it encounters.
An advantage of the index or Boolean search is the ability to find relevant content using a Boolean syntax to help narrow the search. This type of search is beneficial when locating content that can be pinpointed by keywords. The downside of this method is the potential number of items that may be found if the search parameters are not sufficiently narrowed. To reduce confusion from overly large search finds, some index search services have developed methods for ranking the search "hits" based upon various types of relevancy indicators.
No two index search services are the same. How they search for content with the use of spiders or robots and how their listings are compiled in their database can be vastly different. Some services consider words in a Web site's "title" and "description" and "keyword" meta tags of primary relevance in finding a match. Other search services may disregard meta tags and focus on the content of information in the Web site itself. Generally, they will grab a page or two of text and rank the content based on the occurrence of specific words that appear in the content. For example, a Web page which mentions "koa wood" multiple times may be indexed or ranked high for relevancy in a search for "koa wood".
A relevancy ranking may be quantified by some services in terms of percentages, with listings rated with higher percentages listed higher in a search report than those assigned lower percentages. This provides the user with a scale of relative measurement. However, it can result in a Web site assigned a low ranking receiving little or no visits. Index search services can also access listings from multiple databases in cooperation with it and combine the results together in a single search report as if from a single large database. An example of a system for combining the search results of multiple databases is described in U.S. Pat. No. 5,659,732 in the name of S. T. Kirsch, assigned to Infoseek Corporation, Santa Clara, Calif.
Some search services also take into consideration the number of other links pointing at a particular site in determining its importance. Two Web sites with generally the same frequency of the words "koa wood" might be ranked differently by some search services based on the number of other Web sites which make reference to the site's URL address. Such services assume that if a site has several referral links pointing to it, it probably contains relevant information and is of higher value. An example of a system for ranking site listings by how often it is referenced by other sites is described in U.S. Pat. No. 5,748,954 in the name of M. L. Maudlin, assigned to Carnegie Mellon University, Pittsburgh, Pa.
Other factors used to consider a Web site's ranking include verification of matches between the keyword meta tag data and the actual content in a Web site's document. If there is no clear association between the hidden keyword meta tag data and the content data, a site might be marked irrelevant and ranked low in a search. Another negative factor might be the overuse of certain keywords in a Web site. Repeating "koa wood" multiple times in either the keyword meta tag or in the document itself can be considered "spamming", i.e., the repeated use of words in a frequency that the spider or robot identifies as overly repetitive. If a robot or spider detects blatant "spamming", the search service may penalize the Web site by giving it a lower relevance value in search results or even remove the Web site from its database. For Web site designers and publishers, it is critical to present site content in a manner that would increase the likelihood that it will receive a high ranking in a search, while at the same time avoid the kind of over-manipulation of content that may be rejected.
In contrast to index search services, category or directory search services group Web site content into specific categories, like an encyclopedia. Instead of typing in keywords to locate specific information, the user selects a category of interest from a list. Finer-grained levels of subcategories in a hierarchy may be assigned in order to break down the listings in large categories into more manageable lists for the user. The definitions of categories and subcategories are chosen by each search service and is to a large extent arbitrary. The category search service collects information on Web site listings supplied by human editors, which is reviewed and placed into the appropriate categories. This is a time consuming task considering that there are often thousands of new Web site entries per day handled by major search services. The heavy volume of Web site listings has caused most category search services to take weeks, months and even years to list a robust enough set of available Web site entries.
When a Web site is placed in a category, it is usually sorted with the other listings in alphabetical order. This can be an advantage or a disadvantage, depending upon a Web site's alphabetical title position. Because category services rely on human entry of Web site listings, there is usually no automatic review of a Web sites for current status or relevance, and many sites can become defunct or not be updated for years. Some category services have recently combined the category method with a ranking system to assign a highlighted mark, higher position or relevancy measure to Web sites deemed to be of higher value. A Web site having a title late in the alphabet and without a highlighted status will be relegated to a lower portion of the list and will be less attractive and more difficult to locate than others. Having the search service determine what should be highlighted can lead to arbitrary rankings and takes the success of a Web site's to visitors out of their own hands.
Currently, most major search services combine some form of both the index and the category methods to meet user preferences. This allows each type of service to keep or attract new users who might otherwise prefer a different service for a more targeted search function. As a result, users generally find that the benefits and disadvantages of both types of services to be about the same. For the subscriber, each type of service entails some degree of arbitrariness, either in the factors selected to compute a relevancy ranking or in the subjective determination of a site's relevancy.
How high or prominently a Web site is ranked by a search service is directly related to the frequency of visits or "hits" it receives from a search. Generally, the more hits a site has, the more potential inquiries or transactions will occur. In order to achieve positive search results with well over 100 million publicly available Web pages currently, Web site developers need to pay constant attention to the content as well as to the structure and frequency of their Web site submissions. It is not uncommon for Web sites to spend hundreds of dollars to promote their site to search services. Thus, the Internet searching and indexing industry at present is characterized by high opportunity and maintenance costs for results that are arbitrary or uncertain for the subscriber. These conditions may become increasingly unacceptable as the volume or Web sites, number of subscribers, level of commerce, and the costs involved continue to increase.