With advances in natural language processing (NLP), there is an increasing demand to integrate NLP techniques to improve question answering systems. Current techniques for computerized question answering rely on document retrieval search engines to retrieve documents that may contain information related to a question asked to the QA system. Conventional search engines return one caption or snippet per retrieved document in the search result. However, these resulting snippets from the retrieved documents merely provide users a quick impression of whether the whether a document is likely to be relevant to their information request. The user often needs to open the document and read it to gather the required information to determine whether the document is able to answer the actual question asked, or whether it simply contains similar terminology.
Because relevant terms and information are often scattered across different parts of documents, search result captions from conventional document search engines are often fragmentary, and difficult to interpret unambiguously without additional context. Conventional search engines pull together these scattered snippets of information as best they can into a single piece of caption text, at the cost of the text often not being particularly well formed language, and being unclear about whether scattered terms really stand in a meaningful relationship to one another. Accordingly, current QA systems and search engines fail to extract and present, as search results, self-contained and well-formed passages from documents which are clear about the relation between the passage and relevant parts of the document and that contain the information necessary to answer the user's question.