1. Technical Field
The invention generally relates to a system, method and program product for the bidirectional translation of text. Specifically, the invention provides a bidirectional translation corpus that can be selectively used to translate phrases between two languages.
2. Background Art
With the increasing worldwide popularity of the Internet, the importance of automated, and accurate translations of text has grown. For example, a web page could include text that is originally in the Spanish language. However, the viewing user may actually be located in the United States and wish to view the text in the English language. To date, either alternate web pages have been provided allowing the user to view an alternate set of English language web pages, or a “machine translation” technology/program has been used for translating the text of the web pages. Use of an alternate set of web pages is not desirable since it increases the amount of storage required, and updating the web site requires updating multiple pages. However, current machine translation technology does not consistently provide the most accurate translations.
One common problem for translating text arises when a word used in a source language has more than one meaning, and therefore more than one corresponding word in a destination language. This results in a “one to many” relationship between the word in the source language and words in the destination language. For example, the word “eagle” has a different meaning in a sports-related article than it would in a nature-related article. However, current machine translation technology would translate the word into the same destination language word in both instances (if the subject area is not known to the translation system). In general, the word in the destination language that corresponds to the most frequently used definition of the word will be selected.
Some approaches have been proposed for addressing this problem. Many of these approaches provide a database (lexicon) having entries that can only be used for translations in a single direction. For example, many approaches require a first entry for an English to Spanish translation, and a second entry for a Spanish to English translation. As a result, two entries must be stored and maintained (in two separate unidirectional lexicons) to perform bidirectional translations. Further, many approaches only provide context-based translation on a word-by-word basis, rather than on entire phrases (i.e., a group of words or a complete sentence). As a result, these approaches do not consider any grammatical differences that may be appropriate for different contexts. Current approaches also perform poorly when an expression is used that has a meaningless translation in another language. For example, the French expression “C'est la vie” is meaningless when literally translated into Chinese or Spanish. For example, in Spanish, it would be more accurate to translate the phrase as “Que sera sera.” Translation of such expressions can result in erroneous “round trip” interpretations. For example, when the English phrase “out of sight, out of mind” is translated into Russian and then back into English, one approach yields “invisible infinity.” Still further, many context-based approaches do not address situations when the context for a word changes within the same sentence. For example, the sentence “the airplane crash caused the stock market to crash” uses the word “crash” in two different contexts. Many current context-based approaches would select one context or the other for the entire sentence, which results in an incorrect translation of at least one of the occurrences of “crash.”
The increasing popularity of web portal pages further complicates the translation of web content. A portal page is customized to deliver aggregated, personalized content to a computer user. Typically, a portal page is rendered and delivered to a viewing user from a portal server. A portal program such as WebSphere Portal Server, which is commercially available from International Business Machines Corp. of Armonk, N.Y. is loaded on the portal server. The portal program generally obtains and aggregates web content into a portal page. As is known in the art, a portal page includes sections or portlets that each contain particular web content formatted according to a user's preferences. For example, a user could establish his/her own portal page that has sections for news, weather, sports, etc. When the portal page is requested, the portal program would obtain the desired web content from the appropriate content providers. Once obtained, the portal content would be aggregated, and then displayed as a portal web page. This portal technology has lead to the explosion of personalized “home” pages for individual web users (e.g., MY.YAHOO.COM), which only increases the need for phrase-by-phrase translations of web content.
In view of the foregoing, there exists a need for a system, method and program product for the bidirectional translation of text, wherein a bidirectional translation corpus is used to translate phrases between two languages. Further, there exists a need to implement such a system, method, and program product for translating web content that may be provided via portal pages.