This invention generally relates to information retrieval, and more specifically, to assembling answers from multiple documents. Even more specifically, embodiments of the invention relate to Question Answering systems and methods implementing parallel analysis for providing answers to questions and in which candidate answers may be assembled from multiple documents.
Generally, question answering (QA) is a type of information retrieval. Given a collection of documents (such as the World Wide Web or a local collection), a QA system should be able to retrieve answers to questions posed in natural language. QA is regarded as requiring more complex natural language processing (NLP) techniques than other types of information retrieval such as document retrieval, and QA is sometimes regarded as the next step beyond search engines.
QA research attempts to deal with a wide range of question types including: fact, list, definition, how, why, hypothetical, semantically-constrained, and cross-lingual questions. Search collections vary from small local document collections, to internal organization documents, to compiled newswire reports, to the world wide web.
Closed-domain question answering deals with questions under a specific domain (for example, medicine or automotive maintenance), and can be seen as an easier task because NLP systems can exploit domain-specific knowledge frequently formalized in ontologies. Alternatively, closed-domain might refer to a situation where only a limited type of questions are accepted, such as questions asking for descriptive rather than procedural information. Open-domain question answering deals with questions about nearly everything, and can only rely on general ontologies and world knowledge. Open-domain Q/A systems, though, usually have much more data available from which to extract the answer.
Access to information is currently dominated by two paradigms: a database query that answers questions about what is in a collection of structured records; and a search that delivers a collection of document links in response to a query against a collection of unstructured data (text, html etc.).
One major challenge in such information query paradigms is to provide a computer program capable of answering factual questions based on information included in a large collection of documents (of all kinds, structured and unstructured). Such questions can range from broad such as “what are the risk of vitamin K deficiency” to narrow such as “when and where was Hillary Clinton's father born”.
User interaction with such a computer program could be either a single user-computer exchange or a multiple turn dialog between the user and the computer system. Such dialog can involve one or multiple modalities (text, voice, tactile, gesture etc.). Examples of such interaction include a situation where a cell phone user is asking a question using voice and is receiving an answer in a combination of voice, text and image (e.g. a map with a textual overlay and spoken (computer generated) explanation. Another example would be a user interacting with a video game and dismissing or accepting an answer using machine recognizable gestures or the computer generating tactile output to direct the user.
The challenge in building such a computer system is to understand the query, to find appropriate documents that might contain the answer, and to extract the correct answer to be delivered to the user. Currently, understanding the query is an open problem because computers do not have human ability to understand natural language nor do they have common sense to choose from many possible interpretations that current (very elementary) natural language understanding systems can produce.
Being able to answer a factual query in one or multiple dialog turns is of great potential value as it enables real time access to accurate information. For instance, advancing the state of the art in question answering has substantial business value, since it provides a real time view of the business, its competitors, economic conditions, etc. Even if QA is in a most elementary form, it can improve productivity of information workers by orders of magnitude.
U.S. patent application Ser. No. 12/152,441, the disclosure of which is hereby incorporated herein by reference in its entirety, describes a QA system involving the generation of candidate answers and selecting a final answer (or ranking a list of final answers) from among the set of candidate answers.
Current information retrieval and question answering systems attempt to satisfy a user's information need by identifying the single document segment (e.g., entire document, contiguous sequence of one or more sentences, or a single phrase) that is most likely to contain relevant information. There are many information needs, however, that cannot be satisfied by a single document segment.