1. Technical Field
The present invention relates to a machine translation system. More particularly, the present invention relates to a machine translation method and apparatus which can automatically switch between multiple dictionaries. The present invention also relates to a machine-readable storage medium for storing a program for executing such a machine translation method.
2. Description of the Related Art
In general, a machine translation system includes one or more domain dictionaries having technical terms registered therein for each of various domains or fields such as politics, sports and art. Additionally, a base dictionary having common words registered therein and a user dictionary having words proper to a user registered therein by the user can be included. The machine translation system executes translation processing by selective use of these dictionaries. For example, a translation software named “Internet King of Translation (a trademark of IBM Corp.)”, which is marketed by the present applicant in Japan, includes six domain dictionaries which are categorized as follows: “Internet”, “Art”, “Business”, “Sports”, “Politics” and “Entertainment”, as well as a base dictionary. To improve translation quality, dictionaries for use in translations, particularly domain dictionaries must be appropriately selected. Typically, however, a user selects or switches dictionaries by him/herself depending on the source text to be translated.
Some technologies for automatically selecting or switching dictionaries from one to another also are known in the art. In automatically switching dictionaries from one to another, ideally, an appropriate dictionary, according to a domain related to a source text, should be selected after grasping the gist of the source text. Oftentimes, however, the gist cannot be readily grasped. Moreover, in a text in which topics are switched from one to another, it can be difficult to determine a point or a portion of the text in which domain dictionaries should be switched from one to another. For this reason, a typical method currently performed for automatically switching dictionaries from one to another utilizes an appropriate and predetermined keyword for each dictionary. A domain dictionary including the keyword can be selected when the keyword appears in the source text.
In addition, the present inventors have developed a method of automatically switching dictionaries entitled “Translation Word Selection of Pattern Based Translation System PalmTree” described in the Proceedings of IPSJ 59th National Convention, 1999, p. 2-365 to p. 2-366. According to this method, dictionaries are constituted of a system base dictionary (base dictionary) and domain dictionaries. Compound words and single words are classified respectively in the base dictionary and the domain dictionaries. Further the compound words are used as triggers for setting priorities of words higher. By using such a method, more appropriate selection of translated words is enabled.
Note that, in the gazette of Japanese Patent Laid-Open No. Hei 8 (1996)-166955, a method in which a list having pairs of original words and translated words is prepared and a priority order of a plurality of domain dictionaries is determined depending on how words in a source text match the pairs is disclosed. In addition, in the gazette of Japanese Patent Laid-Open No. Hei 7 (1995)-141375, a method in which a user statically designates a priority order in advance by use of a plurality of domain dictionaries is disclosed. Moreover, in the gazette of Japanese Patent Laid-Open No. Hei 5 (1993)-61902, a method in which a domain of a source text is identified by a keyword, and an order of translated words in a system dictionary is changed is disclosed.
In the case where translated words are created by use of a domain dictionary, a more appropriate translated sentence can be created by setting the priority of the domain dictionary higher thereafter. However, even in the case of using this method, particular aspects can be improved as follows.
In general, frequently used single words and compound words can be registered in the base dictionary. General compound words registered in the base dictionary include proper nouns such as team names and athlete names of sports teams and movie titles. These proper nouns are proper words for each of, for example, a sports domain and an entertainment domain, and further, can be effective triggers for automatically switching dictionaries. However, words proper to the domain such as these proper nouns are not registered in an appropriate domain dictionary but a base dictionary because these words are well known (frequently used) in general. For this reason, switching between dictionaries does not function effectively, thus causing a defect that switching to an expected dictionary during translation processing is not realized.
Moreover, words can be registered in the domain dictionary with expressions proper to the domain. For example, in the case where “this season” is expressed in English, an appropriate Japanese translation can be “kon shiizun” in the sports domain. Accordingly, these words are registered as “this season=kon shiizun” in a compound word dictionary of the sports domain. However, “this season” are originally general words. In the base dictionary, “this season=kono kisetsu” should be registered. When the compound words “this season” are inputted at the time of translating a general document rather than a topic of the sports domain, the compound word becomes a trigger to set the priority of the sports domain higher. Thus the sports domain dictionary comes to be used at a position where the base dictionary should be used. This implies a switching to an unexpected dictionary during translation, which can cause a defect wherein a translated sentence can be inappropriate. This problem in which general words can be simultaneously an expression proper to a specified domain is not limited between Japanese and English. Rather, the problem can be an essential problem in translation wherein vocabularies of different kinds of languages do not conceptually coincide with each other completely. For example, in the case of the expression “chateau” in French, a general translated word thereof into English is “castle”. However, when the French expression “chateau” is used concerning the Bordeaux region, the word “chateau” means “winery”. As described above, even if a certain language has only one expression for a plurality of meanings, oftentimes the other language has an appropriate expression for each of the plurality of meanings. Accordingly, a translated word for the one expression with a plurality of meanings as described above should not be used for a trigger for switching dictionaries. Note that, in the description of “chateau” in French, an “accent circumflex” should be originally added on the letter “a” following the letters “ch”. Herein, the letter is simply described as “a” in a relation with the letter code. The description will be the same below in this specification.
Furthermore, there is a defect that the domain constitution in the prior art does not conform to a selection range for appropriate translated words. For example, there is an English single word “shot” corresponding to the Japanese single word “shotto” or “shuuto” in the sports domain. In Japanese, the word “shotto” has been conventionally used as a term for golf, and “shuuto” as a term for soccer or basketball. In order to create a more appropriate translated sentence, it is necessary to prepare a dictionary capable of creating a translated word appropriate for each of the sports domains, that is, golf, football, basketball, and the like.