The present disclosure relates to generating domain corpus subsets from a document corpus to enhance the accuracy of a question-answer system.
An overwhelming amount of information is available to individuals through computer networks from various structured and unstructured sources. To assist with user searches, question-answer (QA) systems are in development that analyze an input question and return results indicative of a most probable answer to the input question. QA systems provide automated mechanisms to analyze large sets of content sources (e.g., electronic documents) corresponding to an input question to determine an answer and a confidence level of the answer's relative accuracy.
In an unstructured information system, information sources utilize various information domains and subdomains to respond to user search requests. However, highly skilled developers are required to generate a customized system with completely accurate rules, which the system utilizes to generate a corpus of documents for the search requests. In addition, as with any customized system, the question-answer system becomes fragile, inflexible, and expensive to maintain.