A machine translation system generally includes a system base dictionary registering common words and a user dictionary registering user unique words, as well as one or more domain dictionaries registering specialized terms for each specialized domain or field such as politics, sports, art, etc., and performs translation processing by selectively using these domain dictionaries. For example, a translation program named “Internet King of Translation” which is licensed by the present assignee in Japan, includes six domain dictionaries categorized in “Internet”, “Art”, “Business”, “Sports”, “Politics”, and “Entertainment”, in addition to a base dictionary. To improve translation quality, it is necessary to suitably select a dictionary, particularly a domain dictionary used for the translation. In the past, it was a common practice for a user to select a dictionary or switches to another dictionary for oneself depending on a source text to be translated.
Some techniques for automatically performing dictionary selection or switching have also been known. In the automatic dictionary switching, it would be ideal to select an appropriate domain dictionary by seizing the gist of a source text. However, it is not easy to seize the gist and, moreover, it is difficult to decide a point at which a domain dictionary should be switched to another one, in a text in which topics are shifted one after another. For this reason, a typical method currently used for automatic dictionary switching selects, in advance, particular keywords for selecting dictionaries, and selects a domain dictionary corresponding to a keyword when the keyword appears in a source text.
Another technique disclosed, for example, in Japanese Unexamined Patent Publication No. 6-60117 reads source text data stored in a data file out to a work station, analyzes a sentence structure thereof, checks whether or not a translated word in each sentence exists in each of five domain dictionaries by using a translation system, increments a translated word check counter corresponding to a dictionary including the translated word, and sets selection priority of each domain dictionary depending on the count values of the check counters.
Also, Japanese Unexamined Patent Publication No. 10-21222 discloses a technique in which a predetermined condition for translation processing is set based on document identification information to be used for identifying a document of a first language when the document is accessed. According to one embodiment thereof, a particular domain is determined by using a URL of Internet as document identification information, and a domain dictionary corresponding to the determined domain is selected.
In the case where a user selects a dictionary for oneself prior to translation, if a specialized field or domain of a source text to be translated is previously known, it would be sufficient to manually select a corresponding dictionary. However, in the case where a specialized field is not known or cannot be determined, or where one source text is related to a plurality of specialized fields, if a particular domain dictionary has been previously selected, it would be possible that inadequate translated words be selected. Also, in the system in which domain dictionaries are switched depending on keywords, adequate keywords should be determined in advance, and the determined keywords should be reviewed whenever the domain dictionaries are updated (addition of new words, deletion of old words, etc.). Furthermore, it would be difficult to determine where in the source text dictionary switching should be done.
In the system in which the selection priority of the dictionaries is established based on the frequency (count data) of translated words, as in the Japanese Unexamined Patent Publication No. 6-601117, single and compound words are not distinguished from each other. Therefore, there is an increased possibility that a wrong domain dictionary is selected when a common word has a specific meaning (e.g., an English word “tour” is commonly translated into “ryoko” (kanji) in Japanese, but in the sports field such as golf, it should be translated into “tsuah” (katakana). In the system which utilizes the document identification information (URL) as disclosed in Japanese Unexamined Patent Publication No. 10-21222, it is necessary to prepare a table for associating identification data with domains and to continue updating the table to accommodate ever-increasing Web sites.
The method and apparatus for machine translation method and apparatus capable of automatic switching dictionaries which is described in the instant invention is designed to eliminate the above drawbacks.