1. Field of Art
The present invention generally relates to the field of search engine technologies, and more specifically, to the field of enterprise search engines for querying relevant documents from a document repository.
2. Description of the Related Art
In general, an enterprise search engine is a software system to search relevant documents with given query statements. The enterprise search engine typically consists of a crawler, an indexer, a searcher and a query engine. The crawler gathers documents from pre-assigned locations and dumps them into document repositories. The indexer fetches documents from the document repositories, creates indices from the documents, and stores the indices into an index database. The searcher searches the index database and returns a list of relevant documents (referenced as “hits”) in response to a specific query. The query engine parses a query expression provided by a user and sends query commands to searcher for processing.
Conventional search engine technologies are insufficient to search relevant documents for many query problems. For example, consider a problem in which the relevance of two documents is assumed to be measured at some predetermined percentage value, for example, X %. Given an input document and the percentage value X %, a search of relevant documents from the document repositories is conducted so that the relevance between this input document and any of the returning documents must be greater than X %.
The direct application of the conventional search engine to the above query problem results in several disadvantages. For example, there is a lack of an accurate and efficient measurement of the document relevance. In addition, conventional systems return a large list of documents, most of which may not be relevant to all. Thus, the precision rate of retrieval is low. Returning a large list of documents is a common problem of all conventional search engine technologies because the query presented by key terms is unable to precisely depict the documents that users are trying to retrieve.
While returning a large number of irrelevant documents, another problem with conventional search engines is they are language dependent. For each written language, a conventional search engine has to implement different language parsers and analyzers. This results in a large use of resources and generally is not efficient.
Yet another problem with conventional search engines is that they measure relevance of documents through models that are often inaccurate or are highly computing intensive. Examples of these inaccurate and resource intensive models include a term vector-space model, a probabilistic model, a latent semantic space mode, and the like.
Hence, there is a need for a system and a method to modify and improve the conventional search engine architecture to efficiently execute the queries to return documents having a high degree of relevance.