The present invention generally relates to near-synonym generating methods, and more particularly to a near-synonym generating method which divides a character string which is to be retrieved into words and generates near-synonyms of the character string by combining near-synonyms which are extracted for each of the words.
The generation of near-synonyms is essential when retrieving various electronic documents with a high accuracy. The "near-synonym" is sometimes also referred to as a "quasi-synonym". The near-synonym of a certain word refers to a word which has the same or similar meaning as the certain word. The generation of near-synonyms is particularly effective when matters related to a certain theme are to be retrieved from a large scale database without omission.
An example of a conventional document retrieval using near-synonyms will be described with reference to FIG. 1. In FIG. 1, it is assumed for the sake of convenience that a character String which is to be retrieved (hereinafter simply referred to as a "target character string") is "holiday/application/member/number". A predetermined electronic document is retrieved using this target character string, and near-synonyms of the target character string within the electronic document are extracted. In the electronic document, a plurality of near-synonyms are included in the target character string as shown in FIG. 1.
In this case, the near-synonyms which are extracted as a result of the retrieval were conventionally the same character string as the target character string and the character String "holiday application member number" having a head which matches that of the target character string.
There is also another known document retrieval method which carries out the retrieval as follows. That is, the target character string "holiday application member number" is divided into words "holiday", "application", "member" and "number" which form this target character string, and the near-synonyms are extracted for each of these words. For example, near-synonyms "report", "employee" and "numeral" are respectively extracted as the near-synonyms of the words "application", "member" and "number". Such near-synonyms are defined in advance for each word. The electronic document is retrieved using a character string "holiday report employee numeral" which is obtained by combining the extracted near-synonyms, and a character string which is the same as this character string is extracted as the target character string of the near-synonyms.
Each element forming the target character string is called a "word", and a character string which is made up of a plurality of words is called a "compound word".
According to the conventional document retrieval methods described above, a character string which is the same as the target character string and a character string having a head which matches that of the target character string were extractable as near-synonyms.
However, there was a problem in that it was impossible to extract a character string having a head (or a part of the head) which does not match that of the target character string, a character string (different sounding synonyms) having completely different words and phrases (sounds) from those of the target character string but having the same meaning as the target character string and the like. For example, in the example shown in FIG. 1, it was impossible to extract "vacation notice member number" and "vacation report member No.".
On the other hand, according to the conventional document retrieval method which uses a combination of the near-synonyms that are extracted for each of the words obtained by dividing the target character string, the extraction was satisfactory to a certain extent depending on the definition of the near-synonyms for each of the words.
However, there was a problem in that it was impossible to extract a character string in which the words and a part of their near-synonyms of the target character string are missing, a character string which is added with one or more words unrelated to the target character string, a character string having the words or near-synonyms arranged in a different order from that of the target character string and the like. For example, in the example shown in FIG. 1, it was impossible to extract "employee number" which is missing a part corresponding to "holiday" and "application", and "member holiday application number" in which a part corresponding to "holiday", "application" and "member" is ordered differently from the target character string.
In addition, according to the conventional document retrieval method which uses the combination of the near-synonyms of the words, there was a problem in that the operator must manually insert in the target character String an end symbol (character) "/" at the end of each word when the target character string is divided into the words. In other words, the operator must have a knowledge related to the words and the near-synonyms. In addition, if the target character string is long and contains a large number of words or, a large number of target character strings need to be input, the process of inserting the end symbol is troublesome and a big burden on the operator.
Therefore, the conventional document retrieval method realized a satisfactory document retrieval only to a certain extent, and the result of the extraction often omitted the necessary near-synonyms, as may be seen from the examples given above. For this reason, a highly accurate document retrieval could not be achieved by the conventional document retrieval methods. In other words, the conventional generation of the near-synonyms was unsuited or insufficient for the purposes of carrying out a retrieval with a high accuracy.