The present invention relates generally to the field of question answering technology, and more particularly to reconciling simultaneous ranking criteria in generating answers.
Question Answering (QA) is a computer science discipline within the fields of information retrieval and natural language processing (NLP) which is concerned with building systems that automatically answer questions posed in a natural language. A QA implementation, usually a computer program, may construct answers by querying a structured database, or table, of knowledge or information, such as a knowledge base. More commonly, QA systems may generate answers from an unstructured collection of natural language documents, or text corpus.
Unstructured data refers to information that is not organized according to a data model, which specifies how the data items relate to one another. Unstructured information is typically text heavy, but may contain data such as dates, numbers, and facts as well. Unstructured data may be indexed. For example, the occurrences of each word in a text document of unstructured data may be recorded in an index structure. Structured data is data that is organized according to a data model or schema. Generally, the term structured data is applied to relational databases and unstructured data applies to everything else.
Superlative/ordinal, or rankable, QA answers questions that include an ordinal, giving a rank or position in a sequence, such as “first”, “second”, or “last”; or a superlative, indicating being of extreme degree, such as “largest”, “smallest”, or “fastest”; or a combination of superlative and ordinal, such as “second largest”. Examples of rankable questions include “Who was the first/10th/most recent president?”, “What is the largest state?”, and “What is the 3rd tallest mountain?”. This type of QA typically requires lookup in a structured database or knowledge base.
Rankable criteria for structured data are often paired with Boolean filters, which may reduce the set of possible answers. For example, in the question “Who was the last Republican president?”, the word “Republican” acts as a Boolean filter on the set of presidents. The term Boolean implies that a given criterion is either entirely true or entirely false for the objects considered.
Non-rankable QA addresses questions that are not posed as rankable questions. Examples include “Which president had a handlebar mustache?” or “What country exports coffee and is home to elephants?”. Since structured data may not exist to answer this type of question, this type of QA typically requires identifying passages in an unstructured text corpus, for example, using keywords in the question, in order to estimate the probability that a candidate answer is correct. QA systems that operate in this way are referred to as probabilistic QA systems.