In general, a search engine can be used to locate, retrieve, and/or present information contained within a corpus. A corpus can include, but is not limited to, all documents accessible via an electronic network such as the Internet, a set of documents related to a specific topic, one or more periodicals, or any other set or subset of electronic information or data objects. Search engines generally receive search requests (or queries) from a search engine user (or searcher) via a search engine user interface. Traditional search engines parse the search request and implement a binary matching algorithm to identify documents that contain one or more search terms from the parsed search request. The binary matching algorithm can identify documents based on a keyword index associated with the document. The search engine returns identified documents to the user based solely on the existence of binary matches. Alternatively, some search engines return identified documents and present them based on both the existence and the number of binary matches.
In an environment where vast quantities of data and information must be quickly searched for meaningful content, relying solely on keyword indexing and binary matching can lead to a large number of spurious and/or irrelevant results. The volume of current data and information and the rate of its growth make the continued use of this approach intractable.
Many methods have been proposed to improve the quality of search results returned by search algorithms. One such method involves routing search requests through a static ontology. The static ontology creates variations of the search request which in turn retrieve information from previously categorized information sources within a corpus. Another proposed method involves the use of natural language processing and other linguistic techniques to provide better document indexing. Another proposed method involves the use of inference engines to pre-process search requests before the search request is issued. Yet another proposed method involves the use of meta-search engines. Meta-search engines submit search requests to multiple third party search engines and provide a consolidated search result using a variety of heuristic and statistical methods.
Another proposed method for improving the effectiveness of search engines involves creating inverted indices, linked lists of terms, and/or expanded terms and searching a database of surrogates instead of the documents themselves. Yet another proposed method uses hypertext metadata as a surrogate to index documents and take advantage of the embedded hyper text markup language standard. Yet another proposed method utilizes rules based classification dependent upon an expert rule. This method is similar to the inference engines in that search requests are pre-processed before they are submitted. Another proposed method involves the use of structured languages and Boolean logic to search a corpus Yet another proposed method involves the use of multiple indexes to cross map ambiguous search requests. Another proposed method involves the use of Bayesian networks to improve the relevance of document classifiers. Yet another proposed method involves latent semantic indexing of text in multiple languages to determine the relatedness of documents in hyper-geometric space. Another proposed method involves the statistical analysis of bulk text and the creation of topic paths to classify documents.
While some of the methods described above have increased the effectiveness of search engines, many problems still exist. Specifically, search engines still return overwhelming amounts of irrelevant and/or spurious documents in response to a typical search request. Thus, there is a need for a search engine that maximizes the relevance of the search results.