Field of the Invention
The present invention is directed in general to the field of improved data processing apparatus, system, and method of operation. In one aspect, the present invention relates to an information handling system, method, and apparatus for evaluating the geographical relevance of answers in a Question Answering (QA) system.
Description of the Related Art
In the field of artificially intelligent computer systems capable of answering questions posed in natural language, cognitive question answering (QA) systems process questions posed in natural language to determine answers and associated confidence scores based on knowledge acquired by the QA system. Examples, of QA systems are Siri® from Apple®, Cortana® from Microsoft®, the IBM Watson™ artificially intelligent question answering computer system available from International Business Machines (IBM®) Corporation of Armonk, N.Y., or and other natural language question answering systems. The IBM Watson™ system is an application of advanced natural language processing, information retrieval, knowledge representation and reasoning, and machine learning technologies to the field of open domain question answering. The IBM Watson™ system uses a deep question answering technology for hypothesis generation, massive evidence gathering, analysis, and scoring. To generate answer candidates from an input question, the deep QA system takes an input question, analyzes it, and decomposes the question into constituent parts. In addition, the deep QA system extracts likely answers, in the form of words and short phrases, from documents in a document collection or database(s) which are scored and ranked to generate one or more hypothesis based on the decomposed question and results of a primary search of answer sources. After performing hypothesis and evidence scoring based on a retrieval of evidence from evidence sources, the deep QA system performs synthesis of the one or more hypothesis, and based on trained models, performs a final merging and ranking to output one or more top ranked answers to the input question along with a confidence measure(s). As will be appreciated, when questions posed to a deep question answering system have geographical information in the question, ignoring this information is a large source of errors. For example, a request for information about goods manufactured in one region may produce an incorrect answer about similar or identical goods manufactured in another if the generated answer does not take into account the geographic focus of the question. It follows that the accuracy of the answers depends on the ability to recognize the geographic information contained in the question and in candidate answers generated in response thereto. However and as explained below, it is a non-trivial matter to identify geographic information in answer candidates and accurately match those to questions which include corresponding geographic information. As a result, the existing solutions for efficiently generating correct answers in response to questions containing geographic information are extremely difficult at a practical level.