With information processing targeted on natural word data, it is difficult to automatically acquire the relation between a natural word and a natural word in terms of the meanings in a state of being able to be calculated by a computer. As the relations in terms of the meanings, there are a relation between a higher concept and a lower concept, a synonym relation to be synonyms with each other, and the like. With natural language application, it is particularly important to acquire and use the synonym relation.
In a case of a task such as monitoring rumors on the Internet in particular, it is insufficient to use the formal names of organizations and products as a search courier and to take the matching documents as the monitoring target. It becomes necessary to acquire synonym expressions such as other names for the organizations and products, abbreviated forms, ciphered forms, and jargons thereof and to add those to the search courier to reduce failures in monitoring. For that, various methods have been proposed for automatically acquiring the synonym expressions.
For example, Non-Patent Document 1 proposes a method which, as the synonym expressions, automatically acquires those with similar contexts appeared among the candidates of the synonym expressions. That is, it is tried to automatically judge that there is a synonym relation when words used simultaneously are common based on the commonality of the words appearing in a given text.
Further, Patent Document 1 discloses a method for defining the relevance degree between words based on correlation coefficients of use frequencies of each search word in time series, which is designed to perform automatic creation of a synonym dictionary which corresponds to the fact that the synonym relation changes over time.
Patent Document 2 discloses a method which generates a collation index from a jargon expression list of “*” and the like and extracts the synonym relation by collating those with the original expressions in order to extract the synonym relation between the jargons such as “*bishi Denki”, “Bo-A chou” and the original expressions such as “Mitsubishi Denki (Mitsubishi Electric)”, “Bouei Chou (the Defense Agency)”.
Patent Document 3 discloses a method which uses information regarding broadcast stations and broadcast time and takes the words excluding the name of the series and the titles of each broadcast from the candidates of the synonyms in order to extract the synonym relations between the name of the program and the abbreviation, the informal name, or the like thereof.    Patent Document 1: Japanese Unexamined Patent Publication Hei 11-312168    Patent Document 2: Japanese Unexamined Patent Publication 2003-296354    Patent Document 3: Japanese Unexamined Patent Publication 2006-163710    Non-Patent Document 1: Terada, et., al., “A Tool for Constructing a Synonym Dictionary using Context Information”, Information Processing Society of Japan, Natural Language Technical Report 2006 (124), November 2006, pp. 87-94
However, among the synonym expressions, there are cases where abbreviations and ciphered letters are ambiguous and cases where the meanings thereof change over time. Thus, it is difficult with the existing synonym expression acquiring methods to determine synonyms correctly. For example, “To* Denryoku” is considered as a ciphered expression of “Tokyo Denryoku (Tokyo Electric Power Company, Incorporated)”. However, this also can be a ciphered expression of “Tohoku Denryoku (Tohoku Electric Power Company, Incorporated)”. As in this case, the content indicated by “To* Denryoku” is polysemous, e.g., may indicate “Tokyo Denryoku” or “Tohoku Denryoku”.
Further, in practice, the content indicated by “To* Denryoku” can change to “Tokyo Denryoku” or “Tohoku Denryoku”. For example, in the case of FIG. 13, “To* Denryoku” indicates “Tokyo Denryoku” at time A and time C while indicating “Tohoku Denryoku” at time B, which is an example where the synonym relation changes over time.
In such case, the existing methods do not take the synonym expression whose content change over time into consideration, so that the synonym relation cannot be determined correctly. The method such as Non-Patent Document 1 which uses the context for determining the synonym relation does not take the fact that the synonym relation changes over time into consideration since it does not use the time information.
Further, in a case where the synonym relation changed over time, the correlation between “To* Denryoku” and “Tokyo Denryoku” or “Tohoku Denryoku” does not become high as shown in FIG. 13 even with the method shown in Patent Document 1 that uses the time series correlation. Thus, it is not possible to determine that there is a synonym relation. That is, in the case of Patent Document 1, the synonymity is also determined on an assumption that the synonymity does not change over time. Therefore, it is not possible to determine the time interval in which the synonymity exists in a case where the synonymity changes over time.
With Non-Patent Document 1 described above, the synonymity can be judged by using the contexts. However, the synonymity that changes depending on the time cannot be grasped since it does not use the time information.
That is, in a case where the synonymity changes over time and a single synonym candidate becomes a synonym with different synonym words over the time, the time series correlation calculated by the method as in Patent Document 1 does not become high. As a result, a synonym relation cannot be extracted.
Further, with the method depicted in Patent Document 2, it is possible to generate a synonym candidate for a synonym word by using words used for ciphers or omissions (◯, “cl” that is a combination of “c” and “L” for “d”). However, it is not possible to grasp the changes in the meaning of the synonym candidate over time.
With the method depicted in Patent Document 3, time information is used for determining synonym words. However, the information from a same information source (broadcast station) is taken as the target, so that it cannot be employed for a text set gathered from an unspecified large number of sources.
Further, with the techniques of Non-Patent Document 1, Patent Documents 1 to 3 described above, and combinations thereof, the synonymity between the synonym candidates and the synonym word cannot be determined correctly when the meanings of the synonym candidates change over time.
It is therefore an object of the present invention to provide a synonym relation determination device, a synonym relation determination method, and a program thereof for making it possible to effectively extract and specify the synonym relation of the synonym candidate whose meaning changes over time from natural words used in texts from an unspecified large number of sources.