With the increased usage of computing networks, such as the Internet, humans are currently inundated and overwhelmed with the amount of structured and unstructured information available to them from various sources. Information gaps abound as users search for information on various subjects and try to piece together what they find and what they believe to be relevant. To assist with such searches, recent research has been directed to generating knowledge management systems which may take an input, analyze it, and return results indicative of the most probable results to the input. Knowledge management systems provide automated mechanisms for searching through a knowledge base with numerous sources of content, e.g., electronic documents, and analyze them with regard to an input to determine a result and a confidence measure as to how accurate the result is in relation to the input.
One such knowledge management system is the IBM Watson™ system available from International Business Machines (IBM) Corporation of Armonk, N.Y. The IBM Watson™ system is an application of advanced natural language processing, information retrieval, knowledge representation and reasoning, and machine learning technologies to the field of open domain question answering. The IBM Watson™ system is built on IBM's DeepQA technology used for hypothesis generation, massive evidence gathering, analysis, and scoring. DeepQA takes an input question, analyzes it, decomposes the question into constituent parts, generates one or more hypothesis based on both the decomposed question and the results of a primary search of answer sources, performs hypothesis and evidence scoring based on a retrieval of evidence from evidence sources, performs synthesis of the one or more hypothesis, and based on trained models, performs a final merging and ranking to output an answer to the input question along with a confidence measure.
The testing of question-answering system metrics (e.g., accuracy) uses a set of questions with corresponding correct answers. Question-answering systems that make use of supervised machine learning require a similar but independent set of question-answer pairs to enable the training of models. In most domains, such question-answer sets are not immediately available and are created by domain experts. This creation process is a time-consuming and error prone task. Errors in the question-answer sets lead to inaccuracy in predicting system question-answering performance. Errors also lead to machine learning models trained on incorrectly classified instances. These problems are costly when few question-answer pairs are available, when writing new pairs takes significant effort, and when detecting errors requires detailed post hoc analysis.