The present application relates generally to an improved data processing apparatus and method and more specifically to mechanisms for training a similar passage cognitive system using a ground truth answer key from a question answering cognitive system.
With the increased usage of computing networks, such as the Internet, humans are currently inundated and overwhelmed with the amount of information available to them from various structured and unstructured sources. However, information gaps abound as users try to piece together what they can find that they believe to be relevant during searches for information on various subjects. To assist with such searches, recent research has been directed to generating question answering (QA) systems which may take an input question, analyze it, and return results indicative of the most probable answers to the input question. QA systems provide automated mechanisms for searching through a large corpus of information, i.e. large sets of sources of content such as electronic documents, and analyze the content with regard to an input question to determine answers to the question and a confidence measure per answer indicating the probability that it is a useful answer for the input question.
Examples, of QA systems are Siri® from Apple®, Cortana® from Microsoft®, and the IBM Watson™ system available from International Business Machines (IBM®) Corporation of Armonk, N.Y. The IBM Watson™ system is an application of advanced natural language processing, information retrieval, knowledge representation and reasoning, and machine learning technologies to the field of question answering. The IBM Watson™ system is built on IBM's DeepQA™ technology used for hypothesis generation, massive evidence gathering, analysis, and scoring. DeepQA™ takes an input question, analyzes it, decomposes the question into constituent parts, generates one or more hypothesis based on the decomposed question and results of a primary search of answer sources, performs hypothesis and evidence scoring based on a retrieval of evidence from evidence sources, performs synthesis of the one or more hypotheses, and based on trained models, performs a final merging and ranking to output an answer to the input question along with a confidence measure.