1. Field of the Invention
This invention relates to a question answering system, a data search method, and a computer program, and more particularly to a question answering system, a data search method, and a computer program, which can provide a more precise answer to a question in a system wherein the user enters a question sentence and an answer to the question is provided.
2. Description of the Related Art
Recently, network communications through the Internet, etc., have grown in use and various services have been conducted through the network. One of the services through the network is search service. In the search service, for example, a search server receives a search request from a user terminal such as a personal computer or a mobile terminal connected to the network and executes a process responsive to the search request and transmits the processing result to the user terminal.
For example, to execute search process through the Internet, the user accesses a Web site providing search service and enters search conditions of a keyword, category, etc., in accordance with a menu presented by the Web site and transmits the search conditions to a server. The server executes a process in accordance with the search conditions and displays the processing result on the user terminal.
Data search process involves various modes. For example, a keyword-based search system wherein the user enters a keyword and list information of the documents containing the entered keyword is presented to the user, a question answering system wherein the user enters a question sentence and an answer to the question is provided, and the like are available. The question answering system is a system wherein the user need not select a keyword and can receive only the answer to the question; it is widely used.
For example, JP 2002-132811 A discloses a typical question answering system. JP 2002-132811 A discloses a question answering system including a question analysis section. The question analysis section determines a search word (keyword), which is applied in searching, and the question type from a question sentence presented by the user. The question answering system executes procedures of making a search based on the search word (keyword), applying an answer extraction rule to a search result, which is a sentence containing the search word (keyword) to extract answer candidates, ranking the obtained answer candidates to output them.
The search result based on the search word (keyword) is an article of a document, for example, and generally is made up of a plurality of sentences. How the accuracy of processing of selecting appropriate words as answer candidates to the question from the search result is enhanced is one problem.
For example, “NTT's Question Answering System for NTCIR QAC2,” (Isozaki. H, Working Notes of NTCIR-4 Workshop, pp. 326-332 (2004); hereinafter referred to as non-patent document 1) discloses a configuration for setting one unit (passage) of text contained in a search result as a variable-length morpheme string (window), searching a range of the window containing a given search word as a search target passage, applying a preset answer extraction rule to a search target passage portion, and efficiently extracting answer candidates. Thus, searching of the portion containing a possible answer candidate in most of existing question answering systems is designed based on the philosophy of acquiring a close portion from the keyword contained in the question sentence. “Importance of Pronominal Anaphora Resolution in Question Answering System” (Jose Luis Vicedo and Antonio Ferrandez, ACL 2000; hereinafter referred to as non-patent document 2) points out the importance of anaphoric analysis in a question answering system, namely, the importance of determining as to whether representations of noun phrases and pronouns contained in the text as the search result are identical with each other, and describes that it is effective to extract answer candidates with applying anaphoric analysis.
As described above, several propositions have been made for how answer candidates to a question are extracted efficiently and with high accuracy from the text as the search result based on the search word (keyword) in the question answering system for providing an answer corresponding to a user's question. However, the technique of the non-patent document 1 attempts to extract an answer candidate with assuming that a portion containing the answer candidate is in the vicinity of a keyword of a sentence containing a search keyword. In this technique, the context of the search document is not considered and therefore if an appropriate answer candidate does not exist before or after the text portion that the keyword most matches, the system cannot provide a right answer; this is a problem.
The non-patent document 2 recommends applying the anaphoric analysis processing. However, in the case of applying the anaphoric analysis processing to a document, which is a search result, if the context of the document obtained as the search result is complicated, a right answer cannot be obtained in some cases; this is a problem. A specific example will be discussed below.
By way of example, it is assumed that an input question from a client (questioner user) is the following sentence:
Question Sentence                “Where is a Christmas tree of a pine tree popular among enthusiasts?”        
Assume that this question is input to a question answering system. Since the question contains “Where is . . . ?” the question answering system determines that the question is a question about “place.” The question type determination processing is executed in most existing techniques.
Further, the question answering system extracts a search word (keyword) applied to searching from the question sentence. Here, it is assumed that the keywords of “Christmas tree, pine tree, popular, enthusiasts” are extracted and that documents are searched using the keywords and one article made up of the following sentences is found:
Sentence 1:                Preparations for the year change period started at the 20th of December in Hata City Castle Ruin Park.        
Sentence 2:                In the park, a pine tree was decorated and became a Christmas tree.        
Sentence 3:                Generally, a fir tree is used as a Christmas tree.        
Sentence 4:                A giant Christmas tree of Rockefeller Center in New York is world-famous.        
Sentence 5:                However, this tree of the pine tree also has a wonderful appeal and is popular among enthusiasts.        
This article is made up of sentences 1 to 5 and extracted as a result of searching databases and Web pages, which are search targets (knowledge sources), by using the keyword search of “Christmas tree, pine tree, popular, enthusiasts.”
In the article as the search result, a topic of a Christmas tree in Hata City Castle Ruin Park is mentioned in sentence 2 and then, a topic of general Christmas trees is mentioned in sentences 3 and 4 and again the topic of the Christmas tree in Hata City Castle Ruin Park is mentioned in sentence 5.
The match degree between each of sentences 1 to 5 and the keywords is analyzed. Sentence 2 contains two keywords and sentence 5 contains three keywords, and thus, it is determined that sentences 2 and 5 have high match degree with the keywords. In the existing answer candidate extraction method, “New York” contained in sentence 5 or “Rockefeller Center” contained in sentence 4, which is “place” in the vicinity of the sentence most matching the keywords, is selected preferentially as an answer candidate to the question.
The true answer “Hata City Castle Ruin Park” is contained in sentence 1, but sentence 1 does not contain any keywords applied to the searching. Therefore, it is determined that sentence 1 is low in the match degree with the keywords. Answer candidate extraction from sentence 1 is executed after answer candidate extraction processing in the vicinity of a sentence having high match degree with the keywords. If a potent answer candidate is extracted in the sentence having high match degree with the keywords or in the vicinity of such a sentence, answer candidate extraction processing from sentence 1 may be unexecuted and consequently, the true answer “Hata City Castle Ruin Park” may not be presented to the user, because the noun phrase existing in the vicinity of the sentence having high match degree with the keywords is selected preferentially as an answer candidate in the existing answer candidate extraction method.
However, the right answer is “Hata City Castle Ruin Park” in sentence 1. Considering the structure of the article extracted as the search result, sentences 3 and 4 of general topics are inserted in midpoint of the context and the description relevant to the Christmas tree of the pine tree in Hata City Castle Ruin Park is distributed. Thus, it is made difficult to select “Hata City Castle Ruin Park” in sentence 1 as an answer to the question sentence.
Here, the case where anaphoric analysis recommended in the non-patent document 2, namely, it is determined as to whether representations of noun phrases, pronouns, and zero pronouns in the text as the search result are identical with each other is executed will be discussed. The case where anaphoric analysis is applied on the article obtained as the search result using an existing anaphoric analysis technique based on the non-patent document 2 and processing of grasping the anaphoric relation in sentences 1, 2, and 5 is performed will be discussed with reference to FIG. 1.
In the anaphoric analysis, identity determination between words in different representations is made. For example, in the anaphoric analysis, it is seen in FIG. 1 that
(a-1) “Hata City Castle Ruin Park” in sentence 1 and (a2) “park” in sentence 2 are identical and indicate the same entity in different representations.
(b1) “Christmas tree” in sentence 2 and (b2) “this tree” in sentence 5 match. Also, in language such as Japanese language in which zero pronouns often appear, (b2) “this tree” is recognized as a zero pronoun with respect to “popular”, and this zero pronoun is a target of the anaphoric analysis. Furthermore,
(c1) “pine tree” in sentence 2, (c2) (pine tree), zero pronoun of the subjective case of “became” in sentence 2, and (c3) “the pine tree” in sentence 5 match.
Thus, in the anaphoric analysis, identity determination between representations of noun phrases, pronouns, and zero pronouns in the text is made. However, in the anaphoric analysis of the related art, the identity determination between representations is only made and processing of moving “Hata City Castle Ruin Park” in sentence 1 to sentence 5, for example, is not performed. Sentence 5 only tells popularity of “the Christmas tree of the pine tree” and such information containing “Hata City Castle Ruin Park” in sentence 5 does not exist. Thus, if only the existing anaphoric analysis processing is executed, it is difficult to arrange the search target of the complicated text or to execute extraction processing of the right answer candidate; this is a problem.