1. Technical Field
The present invention relates to a method and other measures for recognizing an emphasized word in a sentence to automatically translate the sentence.
2. Description of the Related Art
Machine translation programs have been used on information processing devices such as personal computers. These machine translation programs can automatically translate sentences displayed on monitors. They may be used for translating text in Web pages and e-mail messages on the Internet, for example.
The number of people using the Internet has increased in recent years, and along with this increase has come an increase in the amount of information transmitted by individuals. Messages written by individuals are provided as-is through, for example, Web pages set up by individual users, Web pages on which, like message boards, individual users can post messages, and online chat sites, which can be used to carry out interactive conversation between individual users. The machine translation programs are also used for translating such informal information provided by individual users.
Translation programs sometimes incorrectly translate or cannot translate information provided by individuals due to colloquial words and phrases contained in the information. Colloquial words and phrases include many words that are not contained in a dictionary referred to by a translation program. Because any words, even verbs and adjectives, that are not contained in the dictionary are treated as nouns, the translation program can fail to translate a sentence.
Examples of words that are not contained in dictionaries include those words containing a character intentionally duplicated in succession in order to emphasize them (hereinafter referred to as emphasized words) in sentences. To give a few examples in English, they include, “coool” for “cool”, containing an extra “o”, and “worrk” for “work,” containing an extra “r”. When a sentence containing such an emphasized word, such as “coool” or “worrk”, is translated, the adjective “coool” or the verb “worrk” is treated as a noun, resulting in an incorrect translation.
If the emphasized form of “cool” were always represented by including three “o's”, as in “coool”, a correct translation could be obtained simply by adding “coool” to the dictionary. However, the number of successive duplicated characters in the word is not fixed. It may contain three, four, or five “o's”, and so on. Accordingly, the number of possible emphasized words is virtually infinite. Therefore, it is practically impossible to contain all possible emphasized words in a dictionary.
As described above, colloquial words are often used in text written by individuals, especially in a chat in which conversation is made by exchanging written messages, which often cause translation failure.
Emphasized words in which identical characters are used successively are often used in informal expressions as in colloquial text. It may be appropriate to use informal, casual expressions, rather than formal ones, in the translation of such text. It is likely that the feel of such text will be better preserved in a translation by choosing an informal word as well as emphasizing the word in the translation that is equivalent to a word emphasized in the source text.
The present invention has been made to address the above-described technical problem and an object of the present invention is to provide a method and other measures capable of properly translating sentences, even if they contain unregistered words such as emphasized words.