1. Field of the Invention
This invention relates to a question answering system, a data search method, and a computer program, and more particularly to a question answering system, a data search method, and a computer program for making it possible to provide a precise answer by dividing a question sentence into sub-questions, searching for answer candidates based on the sub-questions, and selecting the final answer from among the found answer candidates.
2. Description of the Related Art
Recently, network communications through the Internet, etc., have grown in use and various services have been provided through the network. One of the services through the network is search service. In the search service, for example, a search server receives a search request from a user terminal such as a personal computer or a mobile terminal connected to the network, executes a process responsive to the search request, and transmits the process result to the user terminal.
For example, to execute a search process through the Internet, a user accesses a Web site providing search service, enters search conditions such as a keyword, a category, etc., in accordance with a menu presented by the Web site, and transmits the search conditions to a server. The server executes a process in accordance with the search conditions and displays the process result on the user terminal.
A data search process is implemented by various modes. For example, a keyword-based search system in which the user enters a keyword and list information of documents containing the entered keyword is presented to the user; a question answering system in which the user enters a question sentence and an answer to the question is provided; and the like are available. The question answering system is a system in which the user needn't select a keyword and can receive only the answer to the question; it is widely used.
For example, JP 2002-132811 A discloses a typical question answering system. JP 2002-132811 A discloses a configuration for determining a search-word set and a question type from a question sentence, searching a document set stored in a document-set storage unit for a relevant-document set in accordance with the determined search-word set and the question type, extracting an answer to the question sentence from relevant documents, and providing the extracted answer and document information from which the answer is extracted as an answering result to the question sentence.
In a general question answering system, the question sentence provided by the user is input and the answer to the question sentence is output without outputting the whole hit document. Often, web information is used as a knowledge source to obtain an answer. Under the present circumstances, however, it is difficult to say that the question answering system has sufficient answering accuracy, and the question answering system is less widespread than a general keyword-based search system.
In the current typical question answering system, a process is executed according to the following procedure: First, content words (phrases) are extracted from a question sentence, the obtained content words are used as search words to search the knowledge sources (e.g., Web pages) for an answer to the question, and an answer is extracted from the search result. For example, if the question is “How many hours is the time difference between Japan and Brazil?”, “Japan,” “Brazil,” and “time difference” are extracted as the content words (phrases) and are used as search words (phrases) to make a search. “How many hours” is usually used for the search because an interrogative pronoun is contained. As the search is made, text such that “The time difference between Japan and Brazil is 12 hours” is obtained from the knowledge sources, and it is made possible to extract “12 hours” as an answer. The search technique in question answering is described in “NTT's Question Answering System for NTCIR QAC2” (H. Isozaki, Working Notes of NTCIR-4 Workshop, pp. 326-332, (2004)).
In the technique described above, however, if the question sentence is long and complicated, a search is made with giving a high priority to the search result containing as many content words in the question sentence as possible. Therefore, an appropriate search result cannot be obtained; this is a problem.
For example, if the question is                “What is the wooden roller coaster located in YOMIURI Land?” (the right answer is “White Canyon”),        (*it is noted that “YOMIURI Land” is the name of an amusement park located in Tokyo, Japan.)“YOMIURI Land,” “wooden,” and “roller coaster” are obtained as search words, and a search process, which gives high priority to the search result containing as many search words as possible, is executed.        
Therefore, if the descriptions                “White canyon is a roller-coaster located in YOMIURI Land”, and        “White canyon is a wooden roller coaster”exist in different locations of the knowledge sources, the following description accidentally containing all search words        “Bandit located in YOMIURI Land resembles Elf in HIRAKATA Park, a wooden roller coaster”        (It is noted that “HIRAKATA Park” is the name of an amusement park located in Osaka, Japan.)is retrieved preferentially, and consequently an erroneous answer of “Bandit” or “Elf” is extracted.        
Thus, in the question answering system disclosed in the above publication and document, generally if the question sentence is long, the possibility that an appropriate description containing all content words existing in the question sentence may exist in the knowledge sources becomes extremely low. Further, as a result thereof, the possibility that an erroneous answer may be extracted becomes high; this is a problem.