Field of the Invention
The present invention relates to systems and methods for searching a large corpus of electronically-stored data to identify contextually relevant search results and for improving the accuracy of the search results by modifying the search query.
Description of the Related Art
There has been much research on search and retrieval of electronically-stored documents. Typically, searching requires knowledge of a specific term or set of terms contained in the documents. Similarity-based document retrieval allows the user to fetch “more documents like this one” by using a general document-similarity score as measured by counting words without regard to context. In addition, many information retrieval and folksonomic techniques, such as keyword search and data tagging, offer a simple interface but lack expressive search capabilities.
Conventional computer database applications may offer powerful search capability, but often use complicated, difficult-to-use interfaces, and are inherently brittle because they are tied to the underlying database schema. Conventional Natural Language Processing (NLP)-based systems may offer a simple interface and potential for expressive search capability, but there are often semantic mismatches between the user input and the machine interpretations of the query request.
Accordingly, such search techniques may produce too many false positives and miss too many relevant documents; or, such search techniques may miss relevant documents because the literal terms of the search query are too rigidly applied to the documents in the corpus, thereby failing to identify documents that use terms closely related to those originally provided in the search query.