The goal of developing computational machinery with the ability to generate answers to freely-posed questions or to provide relevant information in response to free-text queries has long been sought. General search services and question-answering systems depend on techniques for analyzing free-text queries or questions and depend on techniques for composing or identifying relevant information or explicit answers from a data set or database of information. Providing relevant information or explicit answers to freely worded queries or questions can be a challenging problem because a structured or unstructured dataset being searched may not contain explicit matching information or answers. In addition a data set may contain multiple variants of relevant answers or answer components.
Approaches to information retrieval and question answering have relied on the application of several key concepts from information retrieval, information extraction, machine learning, and natural language processing (NLP). Automatic question answering from a single, constrained information source is extremely challenging. Consider the difficulty of gleaning an answer to the question “Who killed Abraham Lincoln?” from a source which contains only the text “John Wilkes Booth altered history with a bullet. He will forever be known as the man who ended Abraham Lincoln's life.” However, answering a question is easier when the vast resources of the internet are used, since hundreds of web pages contain the literal string “killed Abraham Lincoln,” providing multiple opportunities for matching and composition.
Many efforts in question answering have focused on fact-based, short-answer questions such as “Who killed Abraham Lincoln?”, “What was the length of the Wright brothers first flight?”, “When did CNN begin broadcasting” or “What two US biochemists won the Nobel Prize in medicine in 1992?” Some question-answering systems have used NLP analyses to augment standard information retrieval techniques. These systems may identify candidate passages using information retrieval (IR) techniques, and then perform more detailed linguistic analyses of both the question and matching passages to find specific answers. A variety of linguistic resources (part-of-speech tagging, parsing, named entity extraction, semantic relations, dictionaries, etc.) may be used to support question answering. Other approaches may use general information retrieval methods that employ methods for rewriting questions or reformulating queries to match the format of answers and then combine multiple results to generate answers.
Most information retrieval systems used in searching operate at the level of entire documents. For example, in searching the web, pointers to complete web pages or documents are returned in response to a search query. However, there has been an interest in finer-grained analyses focused on methods for obtaining answers to questions rather than just retrieving potentially relevant documents or the best matching passages for search queries.