With the increased usage of computing networks, such as the Internet, humans are currently inundated and overwhelmed with the amount of structured and unstructured information available to them from various sources. Information gaps abound as users search for information on various subjects and try to piece together what they find and what they believe to be relevant. To assist with such searches, recent research has been directed to generating knowledge management systems which may take an input, analyze it, and return results indicative of the most probable results to the input. Knowledge management systems provide automated mechanisms for searching through a knowledge base with numerous sources of content, e.g., electronic documents, and analyze them with regard to an input to determine a result and a confidence measure as to how accurate the result is in relation to the input.
One such knowledge management system is the IBM Watson™ system available from International Business Machines (IBM) Corporation of Armonk, N.Y. The IBM Watson™ system is an application of advanced natural language processing, information retrieval, knowledge representation and reasoning, and machine learning technologies to the field of open domain question answering. The IBM Watson™ Question Answering (QA) system is built on IBM's DeepQA technology used for hypothesis generation, massive evidence gathering, analysis, and scoring. DeepQA takes an input question, analyzes it, decomposes the question into constituent parts, generates one or more hypothesis based on both the decomposed question and the results of a primary search of answer sources, performs hypothesis and evidence scoring based on a retrieval of evidence from evidence sources, performs synthesis of the one or more hypothesis, and based on trained models, performs a final merging and ranking to output an answer to the input question along with a confidence measure.
When training a QA system an answer key is generally used which provides questions and the correct answers that needs to be verified when the system runs. Typically these are hand built based on the domain with an expert pulling this information together. Consequently, vast amounts of time and expert resources are often needed to adequately train the QA system for a particular domain.