With the increased usage of computing networks, such as the Internet, humans are currently inundated and overwhelmed with the amount of structured and unstructured information available to them from various sources. Information gaps abound as users search for information on various subjects and try to piece together what they find and what they believe to be relevant. To assist with such searches, recent research has been directed to generating knowledge management systems which may take an input, analyze it, and return results indicative of the most probable results to the input. Knowledge management systems provide automated mechanisms for searching through a knowledge base with numerous sources of content, e.g., electronic documents, and analyze them with regard to an input to determine a result and a confidence measure as to how accurate the result is in relation to the input.
One such knowledge management system is the IBM Watson™ system available from International Business Machines (IBM) Corporation of Armonk, N.Y. The IBM Watson™ system is an application of advanced natural language processing, information retrieval, knowledge representation and reasoning, and machine learning technologies to the field of open domain question answering. The IBM Watson™ Question Answering (QA) system is built on IBM's DeepQA technology used for hypothesis generation, massive evidence gathering, analysis, and scoring. DeepQA takes an input question, analyzes it, decomposes the question into constituent parts, generates one or more hypothesis based on both the decomposed question and the results of a primary search of answer sources, performs hypothesis and evidence scoring based on a retrieval of evidence from evidence sources, performs synthesis of the one or more hypothesis, and based on trained models, performs a final merging and ranking to output an answer to the input question along with a confidence measure.
Once challenge faced by QA systems is that ingested documents are generally considered equal, from the standpoint of one document overriding or superseding another document. When a publisher offers corrections to previously published documents, there is no way to process these corrections in place, generally the new corpus will contain both the original and corrected documents or facts. Additionally, evidence sources are typically assigned a quality rating either by a general classification type (e.g., peer-reviewed journals as high quality, user blogs as low quality, etc.) or by recording how many times an evidence source is referenced in answers produced by the QA system. These evidence quality calculations are too static and course grained. They may not be updated when an evidence source is wrong, and may not be adjusted when that source issues a correction.