1. Field of the Invention
This invention relates to a natural language processing system using a computer to operate as a question answering system for outputting an answer to a question statement expressed in a natural language. More specifically, the present invention relates to a question answering system which adds up evaluation points of a plurality of answer candidates having the same language expression when extracting candidates of answers to the question, obtains the evaluation points and outputs an answer candidate with a higher evaluation point than a predetermined point assigned as an answer.
The question answering system refers to a system which outputs, when a question in a natural language is input, an answer to the question itself. For example, suppose a question “which part of the brain whose cells are dead is related to symptoms of Parkinson's disease?” is input to the question answering system. The question answering system finds out a statement “Parkinson's disease is the to be caused when melanocyte in the substantia nigra of the midbrain denatures and dopamine which is a neurotransmitter created in substantia nigra cells is lost” from a massive amount of digitized text including data such as Web pages, news articles, encyclopedia and outputs an answer “substantia nigra” precisely.
Since the question answering system can extract an answer not from a logical expression or database but from a plain statement (text data) written in a natural language, it is possible to use a massive amount of existing document data.
Furthermore, unlike an information retrieval system which the user needs to find out an answer from articles retrieved using keywords, the question answering system outputs a solution itself accurately, and therefore the user can obtain information on the solution more quickly.
Furthermore, the question answering system automatically outputs the solution itself, and therefore it can also be used as a knowledge processing system inside another automatic knowledge processing system and it is considered as a minimum necessary processing technology when an artificial intelligence system is created.
Such a useful question answering system is considered to be a backbone system for intelligent processing and knowledge processing in the future and great expectations are placed on the improvement of its processing capacity.
2. Description of the Related Art
A general question answering system is roughly made up of three processing means of answer expression estimation processing, document retrieval processing and answer extraction processing.
The answer expression estimation processing is processing which estimates an answer expression based on an expression of an interrogative pronoun, etc., in a question statement entered. An answer expression is a type of language expression of a desired answer. The question answering system predetermines the correspondence of what kind of language expression of a question statement requires what kind of answer expression. Then, when the question statement entered is, for example, “what is an approximate area of Japan?”, the question answering system references the predetermined correspondence and estimates that the answer expression will be “numerical expression” from the expression of “what is an approximate area” in the question statement. Furthermore, when the question statement is “who is Japan's prime minister?”, the question answering system estimates that the answer expression will be a “proper noun (personal name)” from the expression “who” in the question statement.
The document retrieval processing extracts keywords from the question statement, retrieves a document data group using the extracted keywords and extracts document data in which the answer is considered to be written. When the question statement entered is, for example, “what is an approximate area of Japan?”, the question answering system extracts “Japan” and “area” as keywords from the question statement and retrieves document data including the extracted keywords “Japan” and “area” from various document data groups to be retrieved.
The answer extraction processing extracts a language expression that matches the estimated answer expression from the document data including keywords extracted through the document retrieval processing and outputs the language expression as an answer. The question answering system extracts the language expression corresponding to the “numerical expression” estimated through the answer expression estimation processing from the document data including the keywords “Japan” and “area” retrieved through the document retrieval processing as an answer.
Through the above described processing, in response to a question statement “what is the capital of Japan?”, the question answering system outputs an answer “Tokyo.” Nowadays, there is also a question answering system in which when an answer is output, points (evaluation points) for evaluating answer candidates such as a degree of matching are assigned to answer candidates and an answer candidate which has acquired predetermined evaluation points is output as an answer. For example, suppose when evaluation points are assigned to answer candidates for the question statement “what is the capital of Japan?”, “rank; answer candidate; evaluation point; document data identification information (document number) from which the answer candidate is extracted” are output as answer candidate data as follows:                1; Kyoto; 3.3; document number 134,        2; Tokyo; 3.2; document number 12,        3; Tokyo; 2.8; document number 455,        4; Tokyo; 2.5; document number 371,        5; Tokyo; 2.4; document number 221,        6; Beijing; 2.2; document number 113        
Then, when the question answering system adopts the first rank answer candidate and outputs “Kyoto” as an answer, a wrong answer is output because the correct answer is “Tokyo.”
Thus, within the document data which becomes the answer retrieval target, language expressions appearing at many locations together with the expression relating to the content of the question are considered to have more relatedness with regard to the question and can be considered to match the answer of the question better. Based on this concept, for answer candidates having the same language expressions appearing in different document data or at different locations in the document data, there is a technique of adding up evaluation points of the respective answer candidates and regarding the sum total as the evaluation point of the answer candidate (for example, see Reference 1).    [Reference 1: Toru Takaki, Yoshio Eriguchi, “NTTDATA Question-Answering Experiment at the NTCIR-3 QAC”, National Institute of Informatics, The NTCIR Workshop 3 Meeting (3rd NTCIR workshop meeting), October 2002, p. 95-100]
For example, in the example of the aforementioned answer candidate for the question statement “what is the capital of Japan?”, evaluation points of answer candidates are simply added up and counted using the conventional technique. When evaluation points given to the answer candidates “Tokyo” appearing in four document data pieces or at four locations out of the aforementioned answer candidates are counted and regarded as the evaluation point of the answer candidate “Tokyo”, the evaluation ranking of each answer candidate for the question statement is as follows:                1; Tokyo; 10.9; document number 12,455,371,221,        2; Kyoto; 3.3; document number 134,        3; Beijing; 2.2; document number 113        
Then, since the first rank answer candidate “Tokyo” is adopted in the question answering system, the answer output from the question answering system is correct.
However, as shown in Reference 1 above, according to the conventional art of simply adding up the evaluation points of the answer candidates extracted from the document data which is the answer retrieval target for each answer candidate having the same language expression and adopting the answer candidate with an evaluation point equal to or higher than a predetermined level assigned as the answer, there is a problem that a language expression appearing with a high frequency in the document data is likely to be selected as an answer and the accuracy of the answer does not necessarily improve.
Especially when a technique of simple evaluation point addition processing is applied to a question answering system with high accuracy of the answer candidate extraction processing itself, this problem appears more serious. In a question answering system which carries out high accuracy answer candidate extraction processing, though the reliability of evaluation points assigned through the original processing is high, answer candidates are extracted by applying the conventional technique of adding evaluation points to this answer candidate extraction processing based on the total point simply calculated from evaluation points of answer candidates. As a result, many answer candidates whose evaluation itself is low are evaluated higher, which leads to reduce the answering accuracy contrarily.