The field of automatic retrieval of information from a natural language text database has in the past been focused on the retrieval of documents matching one or more key words given in a user query. As an example, most conventional search engines on the Internet use Boolean search to match key words given by the user. Such key words are standardly considered to be indicative of topics and the task of standard information retrieval system has been seen as matching a user topic with document topics. Due to the immense size of the text database to be searched in information retrieval systems today, such as the entire text database available on the Internet, this type of search for information has become a very blunt tool for information retrieval. A search most likely results in an unwieldy number of documents. Thus, it takes a lot of effort from the user to find the most relevant documents among the documents retrieved, and then to find the desired information in the relevant documents. Furthermore, due to the ambiguity of words and the way they are used in a text, many of the documents retrieved are irrelevant. This makes it even more difficult for the user to find the information needed.
The performance of an information retrieval system is usually measured in terms of its recall and its precision. In information retrieval, the technical term recall has a standard definition as the ratio of the number of relevant documents retrieved for a given query over the total number of relevant documents for that query. Thus, recall measures the exhaustiveness of the search results. Furthermore, in information retrieval, the technical term precision has a standard definition as the ratio of the number of relevant documents retrieved for a given query over the total number of documents retrieved. Thus, precision measures the quality of the search results. Due to the many documents retrieved when using the above type of search methods, it has been realized within the art that there is a need to reduce the number of retrieved documents to the most relevant ones. In other words, as the number of documents in the text database increases, recall becomes less important and precision becomes more important. Therefore, suppliers of systems for information retrieval have enhanced Boolean search by using among other things relevance ranking based on statistical methods. However, it is well known that thus highly ranked documents still comprise irrelevant documents.
Questions are a specific type of query. In the field of computerized question answering, systems range from delivering answers to simple questions to presenting complex results compiled from different sources of information. How well a question is answered is typically judged by human standards. Differently expressed, how would a well informed human being respond to a question with respect to correctness and exhaustiveness of the answer (if there is more than one answer), with respect to the succinctness of the answer to the question posed, and with respect to delivering answers quickly.
A basic difficulty for question answering systems is that, as opposed to general information retrieval systems, the inquired fact is often very specific. Thus, the need for precision becomes even more acute.
Many prior art question answering systems suffer from being dependent on knowledge specific to a domain, to a line of business or a special trade. World knowledge optimal for one domain is of little value to another and thus hard to port. To update world knowledge for a domain specific question answering system automatically is not technically feasible and such systems do not scale well.
Other prior art question answering systems that are independent of genre or domain are often restricted with regard to the type of question a user can ask, for example closed-class questions. They are direct questions whose answers are all assumed to lie in a set of objects, and are expressible as noun phrases.