1. Field of the Invention
This invention relates to a question answering system, a data search method, and a computer program, and more particularly to a question answering system, a data search method, and a computer program, which can provide amore precise answer to a question in a system wherein the user enters a question sentence and an answer to the question is provided.
2. Description of the Related Art
Recently, network communications through the Internet, etc., have grown in use and various services have been conducted through the network. One of the services through the network is search service. In the search service, for example, a search server receives a search request from a user terminal such as a personal computer or a mobile terminal connected to the network and executes a process responsive to the search request and transmits the processing result to the user terminal.
For example, to execute search process through the Internet, the user accesses a Web site providing search service and enters search conditions of a keyword, category, etc., in accordance with a menu presented by the Web site and transmits the search conditions to a server. The server executes a process in accordance with the search conditions and displays the processing result on the user terminal.
Data search process involves various modes. For example, a keyword-based search system wherein the user enters a keyword and list information of the documents containing the entered keyword is presented to the user, a question answering system wherein the user enters a question sentence and an answer to the question is provided, and the like are available. The question answering system is a system wherein the user need not select a keyword and can receive only the answer to the question; it is widely used.
Most of question answering systems executes extracts answer candidates to a question from a document set, which is not organized unlike various Web pages or a database that can be accessed, a so-called open domain document set, for example. An art of checking whether or not each answer candidate obtained by searching is an appropriate answer as an answer to the question from the client in such a question answering system for extracting answer candidates is researched.
For example, “Question Answering using Common Sense Knowledge latent in Corpora and Utility Maximization Principle” (Tomoyosi AKIBA, Atsushi FUJII and Katunobu ITOU, Japan Information Processing Society Research Report, 2004-NL-163, pp. 131-138) discloses an art of checking whether or not an answer candidate extracted by searching using a text set other than the search target text set applied to extraction of answer candidate is appropriate as an answer to the question. Specifically, this document discloses processing of checking whether or not the question focus from a client and an answer candidate obtained by searching have hypernym-hyponym relation in a thesaurus, for example, or if the question sentence is a question about a numeric value as an answer, processing of checking whether or not an answer candidate obtained by searching matches the question focus. This document further discloses a configuration for inspecting validity of an answer to the question using a determination pattern representing a relation between the question focus and the acquired answer candidate, and a corpus (search target Language data). JP 2004-118647 A also discloses a processing configuration for inspecting quantity representation, for example, checking that “meters” or “feet” is adequate for representing the elevation.
Here, it is noted that the notion of “question focus” was first introduced by Wendy Lehnert in her book “The Process of Question Answering.” In this book, at page 6, section 1.1-7 the focus of a question is defined as the question concept that embodies the information expectations expressed by the question. For example, given that a question sentence “Who is the President of United States?”. The “question type” of this question sentence is a question inquiring about a person. In other words, the question type means “who”, “what”, “when” and the like. The “question type” is also called as a “main topic” of a question. On the other hand, the “question focus” of this question sentence is a question about the President of United States. The “question focus” is also called as a “query subtopic,” “topic of question” or “question subject.”
Thus, several arts of determining the validity of an answer candidate, which is found using an open-domain information source (also called knowledge source), in the question answering system have been proposed. However, such an answer candidate inspection method basically requires the following procedure:
a: QF (question focus) is extracted from a question sentence using a handcrafted pattern. For example, “film director” is acquired as QF from a question sentence of “Who is a film director presented the People's Honor Award?”
b. Searching based on the QF is executed according to a technique similar to that of the existing question answering system, and answer candidates are acquired. For example, “Keizo Obuchi” and “Akira Kurosawa” are obtained. It is noted that Keizo Obuchi (Jun. 25, 1937-May 14, 2000) was a Japanese politician and the 84th Prime Minister of Japan from Jul. 30, 1998 to Apr. 5, 2000.
c. A pattern made up of the QF (question focus) and the answer candidates is generated and a corpus (search target language data) is searched with using the generated pattern as a search character string. For example, if the corpus is again searched with using a pattern made up of character strings of “a film director named Keizou Obuchi” and “a film director named Akira Kurosawa” and then found search result is obtained, it is determined that the answer candidate applied to the pattern has high validity for the question, and only such an answer candidate is output as the answer to the question.
However, in the answer inspection technique described above, a pattern made up of the QF (question focus) and the answer candidates is generated and a search is made; only the QF (question focus) is acquired from the question sentence input from the client, but the configuration does not acquire further information to be used in the inspection.
In such an inspection technique, there is a possibility that the following problem may occur: For example, the following question is considered:
Question
“Who is a baseball player who went to Hiroshima in 2003?”
A word acquired as the QF (question focus) for this question is “baseball player”. For example, the following patterns are generated for answer candidates (A, B, . . . ) obtained as the search result:
[A is a baseball player]
[B is a baseball player]
Then, inspection is conducted by searching a corpus with using these generated patterns.
However, the inspection may not be sufficient in some cases. That is, if a user who inputs
Question
“Who is a baseball player who went to Hiroshima in 2003?” intends that “Hiroshima” contained in this question sentence means a baseball team of “Hiroshima Carp”, answer candidates, which is obtained with using the search keywords of “2003, Hiroshima, baseball player”, probably contain any “baseball player” other than “baseball players of Hiroshima Carp.” The answer candidates obtained based on the keywords of “2003, Hiroshima, player” may contain baseball player names of other Japanese professional baseball teams such as opposing teams of Hiroshima Carp and a baseball team with which Hiroshima Carp trades baseball players. It is noted that Hiroshima Carp is one of professional baseball teams in Japan's Central League.
At this time, if only “baseball player” is extracted as the QF (question focus) from the question, and inspection is conducted with using a character string pattern made up of the QF (question focus) and an answer candidate, even an answer candidate of a baseball player of any other baseball team has a sufficient possibility that a hit sentence will appear in the corpus, and the answer candidate passes the inspection, resulting in an erroneous determination that the answer to the user question is valid.
For example, if a player named “YANO,” who is a player of Hanshin Tigers (another one of the professional baseball teams in Japan's Central League), is obtained as an answer candidate, according to the technique described above,
QF (question focus) for question =“baseball player”
answer candidate=“YANO”
are used to generate a character string pattern of “YANO of a baseball player”. If the corpus is searched with the character string pattern of “YANO of a baseball player” as a query, The probabilities that a hit sentence will be found in the corpus is sufficiently high. If a character string of “a baseball player who went to Hiroshima” is used as QF, there remains lexical semantic ambiguity as to whether “Hiroshima” in the QF has meaning of “place name” or “sports team” and valid inspection may not be conducted.