1. Field of the Invention
This invention relates to a question answering system, a data search method, and a computer program, and more particularly to a question answering system, a data search method, and a computer program for making it possible to efficiently provide a highly accurate answer in a system in which a user enters a question sentence and an answer to the question is provided.
2. Description of the Related Art
Recently, network communications through the Internet, etc., have grown in use and various services have been provided through the network. One of the services through the network is search service. In the search service, for example, a search server receives a search request from a user terminal such as a personal computer or a mobile terminal connected to the network, executes a process responsive to the search request, and transmits the process result to the user terminal.
For example, to execute a search process through the Internet, a user accesses a Web site providing search service, enters search conditions such as a keyword, a category, etc., in accordance with a menu presented by the Web site, and transmits the search conditions to a server. The server executes a process in accordance with the search conditions and displays the process result on the user terminal.
A data search process is implemented by various modes. For example, a keyword-based search system in which the user enters a keyword and list information of documents containing the entered keyword is presented to the user; a question answering system in which the user enters a question sentence and an answer to the question is provided; and the like are available. The question answering system is a system in which the user needn't select a keyword and can receive only the answer to the question; it is widely used.
For example, JP 2002-132811 A discloses a typical question answering system. JP 2002-132811 A discloses a configuration for determining a search-word set and a question type from a question sentence, searching a document set stored in a document-set storage unit for a relevant-document set in accordance with the determined search-word set and the question type, extracting an answer to the question sentence from relevant documents, and providing the extracted answer and document information from which the answer is extracted as an answering result to the question sentence.
In a general question answering system, the question sentence provided by the user is input and the answer to the question sentence is output without outputting the whole hit document. Often, web information is used as a knowledge source to obtain an answer. Under the present circumstances, however, it is difficult to say that the question answering system has sufficient answering accuracy, and the question answering system is less widespread than a general keyword-based search system.
On the other hand, it is known that typical question patterns exist in the question answering system. For example, the typical question patterns include the followings:
When was {Ieyasu TOKUGAWA} born?
{Where} is {the capital} of {Congo}?
{Where} is {Taj Mahal}?
It is noted that Ieyasu TOKUGAWA was the founder of the Tokugawa bakufu of Japan, which ruled from 1600 to 1868, and that Ieyasu was the first shogun of the Tokugawa bakufu.
In the questions, if the word enclosed in { } is replaced with another word, various questions of the same question pattern are generated. For example,
“When was {Ieyasu TOKUGAWA|Yoritomo MINAMOTO|Genpaku SUGITA} born?”
“{Where|how many people|who} is {the capital|Population|prime minister} of {congo|Estonia|Latvia}?”
“{Where} is {Taj Mahal|Angkor Wat|opera house}?”
In these questions, {a|b|c} represents that a, b and c can be replaced with each other. It is noted that Yoritomo MINAMOTO and Genpaku SUGITA are the names of Japanese historical persons.
Thus, it is known that there is an empirical rule (Zipf rule) that questions presented by the users in the question answering system are classified into a small number of typical question patterns and such typical question patterns cover most of the whole questions. This is described in detail in “Question Answering Techniques for the World Wide Web,” (Jimmy Lin and Boris Katz, Tutorial presentation at The 11th Conference of Computational Linguistics (2003)).
“Omnibase: Uniform access to heterogeneous data for question answering” (Boris Katz, Sue Felshin, Deniz Yuret, Ali Ibrahim, Jimmy Lin, Gregory Marton, Alton Jerome McFarland, and Baris Temelkuran, In Proceedings of the 7th International Workshop on Applications of Natural Language to Information Systems (2002)) has proposed a technique of manually providing “set of typical question pattern and Web page comprehensively containing an answer to the question” and dramatically improving the answering accuracy to the question matching the question pattern. For example, a Web page having a list of country names and capitals is previously specified for a question pattern of “Where is the capital of [country name]?” such as
Where is the capital of USA?
Where is the capital of England?
If a question matching the question pattern is input to the system, the list is referenced and the capital corresponding to the specified country name is output as an answer, whereby it is made possible to efficiently return the error-free answer.
However, the above-described technique, namely, the process of manually providing the set of typical question pattern and Web page comprehensively containing the answer to the question requires that the Web page comprehensively containing the answer to the typical question pattern be previously specified, and enormous man-hours are needed; this is a problem. Further, the maintenance cost to deal with disuse, drastic content change, or URL change of the Web page becomes extremely high; this is also a problem.