1. Field of the Invention
The present invention is relates to an apparatus/method for comparing text sentences with each other to check differences in semantic contents by using, for example, a computer. More specifically, the present invention relates to an apparatus/method for comparing text sentences in high precision and in real time.
2. Description of the Related Art
Since IT technology has made rapid progress, especially, high-speed Internet mobile technology has made rapid progress, very large amounts of information may be utilized by anybody, anywhere, and anytime. Conversely, a so-called “information-flood phenomenon” may occur, so that users can hardly acquire such information which is truly required for these users. To realize such a world that proper information can be continuously acquired even under any conditions of users, the information which owns true values for these users must be extracted/reconstructed from such an information flood.
In this case, techniques for comparing semantic contents of documents with each other, techniques for classifying text documents in accordance with the semantic contents, and techniques related to understandings of information searching intentions of users may constitute important aspects. Also, in order to realize the comparisons of the semantic contents of the documents, the classifications of the text documents, and the understandings of the information searching intentions of the users, similarity judgments as to meaning by utilizing natural language processing technologies are necessarily required.
In this field, several sorts of technical ideas for judging similarity between text sentences have been proposed. However, the major technical ideas among them utilize local information of sentences, for example, word information appeared in sentences and dependency relation information between words, and therefore, can be hardly applied as evaluation bases of semantic contents of text sentences, namely cannot realize such a goal that the semantic contents of the documents are compared with each other, and the information searching intentions of the users are understood.
Very recently, such a method has been proposed. That is, text sentences are semantically analyzed, the analyzed text sentences are represented in the form of graphs, and then, experimental similarity are measured based upon the graphic representations. However, the proposed similarity has been measured not by considering structural changes, and also there is no clear definition in a relationship between the definitions of the similarity and the differences in the semantic contents of the text sentences.
As examples of the conventional techniques related to the present invention, the below-mentioned prior art has been proposed.
[Non-Patent Publication 1]
“Japanese Semantic Analysis System SAGE using EDR” written by Harada and Mizuno, “Japanese Society for Artificial Intelligence” in 2001, 16(1), pages 85 to 93.
[Non-Patent Publication 2]
“A Quantitative Representation of Features based on Words and Documents Co-occurences” written by Shoko Aizawa, “Natural Language Processing” in March, 2000, 136-4.
[Non-Patent Publication 3]
“Self-Organizing Semantic Map of Japanese Nouns” written by Q. Ma, “Information Processing Society of Japan”, volume 42, No. 10, in 2001.
[Non-Patent Publication 4]
“The Metric Between Trees based on the Strongly Structure Preserving Mapping and Its Computing Method” written by Tanaka, “The Institute of Electronics, Information and Communication Engineers”, volume No. J67-D, No. 6, pages 722 to 723, in 1984.
[Non-Patent Publication 5]
“Algorithms for computing the Distances between un ordered Trees” written by Liu and Tanaka, “The Institute of Electronics Information and Communication Engineers”, volume No. J78-A, No. 10, pages 1358 to 1371, in 1995.
As previously described in the above prior art, the conventional systems contain such problems that the performance of comparing the similarity of the semantic contents between the text sentences is still inadequate. Also, the conventionally proposed similarity can be hardly linked to the explanations as to the differences in the semantic contents between the text sentences.