I. RELATED APPLICATION
This application is related to concurrently filed applications titled, xe2x80x9cSystem and Method for Network-based Teletranslationxe2x80x9d, Ser. No. 09/294,026, filed on Apr. 20, 1999 and issued as U.S. Pat. No. 6,338,033, commonly assigned, and xe2x80x9cSystem and Method for Internet-based Translation Brokerage Servicesxe2x80x9d, Ser. No. 09/294,027, filed on Apr. 20, 1999, commonly assigned, and incorporates the commonly assigned applications by reference in their entirety for all purposes.
II. Field of the Invention
The present invention relates generally to language translation and, more specifically, to a system and method for enhancing document translatability.
III. Description of the Related Art
Today, as more and more businesses operate across international borders, they are often required to conduct business in more than one language. Also, businesses often encounter a need for translating documents from one natural language to another natural language.
In the past, businesses have utilized human-based translation (HT) to translate documents. Although HT generally produces high quality work, it is inherently slow, labor intensive, and often expensive. Human translators are quite often specialists in a given language pair (e.g., English/French). Hence, there is a limitation on how the human translators can be allocated to different translation tasks, thus resulting in certain rigidity for a business employing the human translators.
Because HT is labor intensive, it is difficult to scale up when need increases and difficult to scale down when need decreases. The capacity of any group of translators is fairly well defined. When a sudden need arises to increase the capacity for a particular language pair, adding translators to the process creates various problems, such as harmonizing different styles, sharing glossaries and context information, and merging translated text.
The document to be translated is often submitted to the translators in different formats, for example, computer printouts, faxes, word processing files, email attachments, web pages. The translators are then left to handle the formats and extract translatable contents. While requesters of translation services prefer that the translated document be in the same format as it was originally submitted, this is often not possible, because different translators have varying technical skills and often are unable to reformat the translated document into the original format.
For these reasons, machine translation software programs, also known as machine translation engines, have been developed to provide computerized translations. Today, the term Machine Translation (MT) is widely used in the industry to refer to computerized systems that translate documents from one natural language to another, with or without human assistance. It is important to note that the term MT does not include computer-based tools that support translators by providing access to dictionaries and terminology databases, or tools that facilitate the transmission and reception of machine-readable texts, or tools that interact with word processing, text editing or printing equipment. The term MT does, however, include systems in which translators or other users assist computers in the production of translations, including combinations of text preparation, on-line interactions and subsequent revisions of machine-translated documents.
While MT engines are useful, they have several disadvantages. MT engines are typically programmed to handle documents having only certain types of formats. For example, some MT engines accept rich text format (RTF), while others accept only ASCII files. As a result, businesses often are forced to turn down translation jobs because their MT engines cannot handle a particular format or at best implement a non-trivial way of extracting the text for translation from the format information and reinserting the translated text back into the format information.
Documents sent to MT engines typically are composed of various types of information, e.g., text, graphics, diagrams, formatting information, hyperlinks, etc. All MT engines are not equal in handling text, graphics, hyperlinks, etc. Some MT engines, for instance, are not able to identify hyperlinks, while others miss formatting tags.
Furthermore, the text itself may contain information of a more circumstantial nature, for example, circumstances relating to a specific time or a place. The phrase xe2x80x9cLes Bouches du Rhxc3x4ne sont ravagxc3xa9es par le feuxe2x80x9d should not be translated into xe2x80x9cThe mouths of the gutter are harrowed by firexe2x80x9d(xe2x80x9cLes Bouches du Rhxc3x4nexe2x80x9d is the name of a small region in the south of France). Likewise, the phrase xe2x80x9cKohl hat alles verlorenxe2x80x9d should not be translated into xe2x80x9cCabbage has lost it allxe2x80x9d (Kohl is a former German chancellor). In general, MT engines do not deal with these special problems efficiently. If a MT engine is to be programmed to handle these special problems, it will necessitate adding many new lines of code to the MT engine. It will require having access to the source code and/or the necessary programming interfaces of the MT engine. It will also require that the code be constantly updated to take into account the emergence of new cases. Adding additional code to the MT engine risks making the translation process slower. Finally, code changes and additions will be unique to each specific MT engine, requiring that the same kind of code changes and additions be made over and over, once for each specific MT engine.
For these reasons, it has been recognized that there is a need for enhancing the document translatability before submitting it to MT engines. There is a need for a system and method that allows MT engines to handle a wide variety of formats. Furthermore, there is a need for a system and method that allows MT engines to efficiently translate information of a more circumstantial nature as described before, and where the words used to express the circumstantial nature can vary widely and quickly. Furthermore, there is a need to solve these special problems with no change to the MT engines code, and in a way that is applicable to many MT engines at once.
The present invention is directed to a teletranslation system and method for enhancing document translatability. The teletranslation system translates a document from one natural language to another. In one embodiment, the system comprises an aggregate filter having a plurality of sections, each section performing a specific process or processes on the document in a predetermined order, each section having at least one atomic filter, and at least one MT engine for translating the processed document. In one embodiment, the aggregate filter comprises a format conversion section, a text improvement section, a word tagging section, and a translation section. The aggregate filter analyzes the document based on a source text, format information, and a target language.
The method for enhancing document translatability comprises processing the document by an aggregate filter having a plurality of sections, each of the sections processing the document in a predetermined order, each section having at least one atomic filter, and translating the processed document by a MT engine. The method further comprises changing the format of the document at a format conversion section, modifying the text at a text improvement section, tagging words at a word tagging section, and translating the document at a translation section. The method further comprises preprocessing the document at the atomic filters in a first pass, and post-processing it at the atomic filters in a second pass.