A vast amount of information is created daily and stored in digital form. Because the cost of storing information in digital form has dropped precipitously in recent years, little incentive often exists to destroy “old” information as “new” information is stored. As a result, the total amount of information stored in digital form has grown, and continues to grow, very quickly.
One effect of this “information explosion” is illustrated by the Internet and World Wide Web (“Web”). In particular, while an extremely large amount of information has become available via the Web (some estimates indicate that it comprises tens of billions of Web pages), specific information can be very difficult to locate. That is, it can be challenging for users to identify and retrieve, from among the vast amount of available information, the specific information that meets their needs.
One tool that has been developed to help users locate specific information is the search engine. A search engine, when deployed on the Web, typically utilizes “web crawlers” to collect new or updated content from Web sites on a periodic basis, summarizes that content, and stores the summary information in an indexed database. A user may employ the search engine by providing criteria in the form of one or more keywords or phrases, which are used as the basis for a query that executes against the database. The results of the query are typically presented to the user in the form of hyperlinks to the actual Web pages found by a web crawler.
The amount of information stored in many on-line “private” data collections (i.e., those accessed only by specific individuals, such as a business's database that is accessed by employees) has also grown tremendously, leading many organizations that maintain them to make search engines available to the collections' users. In particular, many organizations maintain a private collection of data relating to offline endeavors, and make a search engine available so that employees may quickly locate information on those endeavors when needed. For example, a law firm may maintain a database of records that each pertain to a particular case and/or client, or a doctor's office may maintain a database of records that each relate to a particular patient and/or consultation. Employees within the law firm or doctor's office may employ a search engine to quickly access information on a particular endeavor by providing one or more keywords or phrases to a search engine that characterize the endeavor. The search engine (and/or an integrated interface) may generate a query based on the keyword(s) or phrase(s), and the query may be executed against the database. Results of the search may be provided to the user in the form of hyperlinks (e.g., to web pages that each display information stored in a record, such as further detail on the endeavor), although a summary representation of each result may be included as well.
Search engines used with either public or private data collections can suffer from various deficiencies. In particular, many search engines include a large number of irrelevant results in those which are presented to a user, and many search engines are unable to identify relevant search results. Irrelevant results, or those which do not meet the user's needs with respect to the search, may be produced because the quantity of data being searched is so vast, and the criteria provided by the user may be matched by many records in the data collection. As a result, many results presented to the user may include the keyword(s) or phrase(s) provided by the user, but may bear only a tangential or peripheral relationship to the information actually sought by the user, or no relationship at all. This may be because the user is unable to conceive of keyword(s) or phrase(s) that suitably identify the information sought without also identifying irrelevant information, or because the search engine is unable to properly classify results. Regardless, because a user's capacity to review search results is generally limited (one estimate indicates that the average user is willing to examine only the first 10-20 results), many users can be dissatisfied with the results identified. Moreover, as the amount of data stored in the typical data collection increases, the probability also increases that less relevant or irrelevant results will “wash out” results that interest the user.
A variety of attempts have been made to identify, and present first, the search results which are most relevant to the user. For example, the Google search engine ranks the relevance of search results based in part on the quantity of other Web pages that link to each page represented by a result. The theory is that results others find important likely will be results the user will find interesting. Other search engines, such as a meta-crawler search engine, aggregate results identified by multiple search engines, so that results deemed relevant by a plurality of search engines are designated most relevant overall.
Ultimately, these and other attempts have only marginally improved the identification and delivery of relevant search results. As a result, users spend more time than necessary searching for relevant information. This is costly for the business community and frustrating for users.