A question and answer system (Q and A system) is an artificial intelligence application executing on data processing hardware. A Q and A system answers questions pertaining to a given subject-matter domain presented in natural language.
A Q and A system is an existing application that is capable of replying with natural language answers when presented with natural language questions and one or more suitable knowledge base pertaining to the subject matter domain of the question. IBM Watson is an example of a Q and A engine. (IBM and Watson are trademarks of International Business Machines Corporation in the United States and in other countries).
A Q and A system can be configured to receive inputs from various sources. For example, the Q and A system may receive as input over a network, a corpus of electronic documents or other data, data from a content creator, information from one or more content users, and other such inputs from other possible sources of input. Data storage devices store the corpus of data. The Q and A system can operate in environments of any size, including local and global, e.g., the Internet. Additionally, a Q and A system can be configured to serve as a front-end system that can make available a variety of knowledge extracted from or represented in documents, network-accessible sources and/or structured data sources. In this manner, some processes populate the Q and A system with input interfaces to receive knowledge requests and respond accordingly.
A content creator creates content in a document for use as part of a corpus of data with the Q and A system. The document may include any file, text, article, or source of data for use in the Q and A system. For example, a Q and A system accesses a body of knowledge about the domain, where the body of knowledge (knowledgebase) can be organized in a variety of configurations. For example, a knowledgebase of a domain can include structured repository of domain-specific information, such as ontologies, or unstructured data related to the domain, or a collection of natural language documents about the domain.
Content users input questions to the Q and A system that Q and A system answers using the content in the corpus of data. When a process evaluates a given section of a document for semantic content, the process can use a variety of conventions to query such document from the Q and A system.
One convention is to send the query to the Q and A system as a well-formed question. Semantic content is content based on the relation between signifiers, such as words, phrases, signs, and symbols, and what they stand for, their denotation, or connotation. In other words, semantic content is content that interprets an expression, such as by using Natural Language Processing. In one instance, the process sends well-formed questions (e.g., natural language questions) to the Q and A system. The Q and A system interprets the question and provides a response to the content user containing one or more answers to the question. In another instance, the Q and A system provides a response to users in a ranked list of answers.
The Q and A system receives an input question, parses the question to extract the major features of the question, uses the extracted features to formulate queries, and applies those queries to the corpus of data. Based on the application of the queries to the corpus of data, the Q and A system generates a set of hypotheses or candidate answers to the input question, by looking across the corpus of data for portions of the corpus of data that have some potential for containing a valuable response to the input question.
The Q and A system then performs deep analysis on the language of the input question and the language used in each of the portions of the corpus of data found during the application of the queries using a variety of reasoning algorithms. There may be hundreds or even thousands of reasoning algorithms applied, each of which performs different analysis, e.g., comparisons, and generates a score. For example, some reasoning algorithms may look at the matching of terms and synonyms within the language of the input question and the found portions of the corpus of data. Other reasoning algorithms may look at temporal or spatial features in the language, while others may evaluate the source of the portion of the corpus of data and evaluate its veracity.
The scores obtained from the various reasoning algorithms indicate the extent to which the potential response is inferred by the input question based on the specific area of focus of that reasoning algorithm. Each resulting score is then weighted against a statistical model. The statistical model captures how well the reasoning algorithm performed at establishing the inference between two similar passages for a particular domain during the training period of the Q and A system. he statistical model may then be used to summarize a level of confidence that the Q and A system has regarding the evidence that the potential response, i.e. candidate answer, is inferred by the question. This process may be repeated for each of the candidate answers until the Q and A system identifies candidate answers that surface as being significantly stronger than others and thus, generates a final answer, or ranked set of answers, for the input question.
More information about the IBM Watson™ Q and A system may be obtained, for example, from the IBM Corporation website, IBM Redbooks, and the like. For example, information about the IBM Watson™ Q and A system can be found in Yuan et al., “Watson and Healthcare,” IBM developerWorks, 2011 and “The Era of Cognitive Systems: An Inside Look at IBM Watson and How it Works” by Rob High, IBM Redbooks, 2012.
A method of determining the accuracy of a Q and A system includes verifying an answer provided by the Q and A system using a set of acceptable answers. The set of acceptable answers is called an answer key. If the Q and A system's answer for a question matches an answer for that question in the answer key, the Q and A system is deemed to have responded correctly, otherwise incorrectly. The proportionality of the correct versus the incorrect answers for a battery of questions in a given domain corresponds to the accuracy of the Q and A system in that domain.
Accuracy of a Q and A system in a domain can be tested using one or more answer keys. For example, an answer key in a domain may contain acceptable answers to questions pertaining to one sub-domain but not another sub-domain. Therefore, Q and A system answers pertaining to different sub-domains have to be verified using different answer keys in that domain. Accuracy in different domains is similarly tested using domain-specific one or more answer keys.