1. Field of Art
The present invention generally relates to the field of search engine technologies, and more specifically, to content match engines.
2. Description of the Related Art
Conventional Search Engines
In general, an enterprise search engine is a software system to search relevant documents with given query statements. The enterprise search engine typically consists of a crawler, an indexer, a searcher and a query engine. The crawler gathers documents from pre-assigned locations and dumps them into document repositories. The indexer fetches documents from the document repositories, creates indices from the documents, and stores the indices into an index database. The searcher searches the index database and returns a list of relevant documents (referenced as “hits”) in response to a specific query. The query engine parses a query expression provided by a user and sends query commands to searcher for processing.
Consider, for example, the conventional search system 100 that is depicted in FIG. 1. The conventional search system 100 may fetch documents from one or more document sources 105(a-n) that are stored in a document repository 110. The documents from document sources 105(a-n) are indexed by a search engine 120, and the indexed documents 122 are stored in an index database 124.
Subsequently, a user 150 seeking information may use a query composer 130 to compose a query to search documents 126 in the search engine 120. The search may then be conducted by the search engine 120 against the indexed documents 122 in the index database 124. When a match or matches (i.e. “hits”) are found corresponding to the query, the search engine 120 returns the matching indexed documents as search results 135 that are presented to the user 150.
The above-discussed search system, while an improvement over manual searching, still has various limitations. One limitation is that the indexed documents may not necessarily be relevant with respect to the context of the query. For instance, a search for documents related to National Football League scores may return results related to the English Football (Soccer) rather than the American Football league.
More generally, conventional search systems are insufficient to search relevant documents for many query problems. For example, consider a problem in which the relevance of two documents is assumed to be measured at some predetermined percentage value, for example, X %. Given an input document and the percentage value X %, a search of relevant documents from the document repositories is conducted so that the relevance between this input document and any of the returning documents must be greater than X %.
The direct application of a conventional search system to the above query problem results in several disadvantages. For example, a conventional search system may lack an accurate and efficient measurement of the document relevance.
In addition, a conventional search system generally returns a large list of documents, most of which may not be relevant at all. Thus, the precision rate of retrieval is low. Returning a large list of documents is a common problem of conventional search engine technologies because the query presented by key terms is unable to precisely depict the documents that users are trying to retrieve.
Another disadvantage with the direct application of conventional search systems is that they typically measure the relevance of documents through models that are often inaccurate or that are highly computing intensive. Examples, of these inaccurate and resource intensive models include a term vector-space model, a probabilistic model, a latent semantic space model, and the like.
Therefore, there is a need to modify and improve conventional search systems so that, in response to a query, the search system returns a precise and accurate list of documents having a high degree of relevance. In addition, there is a need to modify and improve conventional search systems to make efficient and effective use of available resources.