1. Field of the Invention
The present invention relates generally to computer-based document creation and translation system and, more particularly, to a system for authoring and translating constrained-language text to a foreign language with no pre- or post-editing required.
2. Related Art
Every organization whose activities require the generation of vast quantities of information in a variety of documents is confronted with the need to ensure their full intelligibility. Ideally, such documents should be authored in simple, direct language featuring all necessary expressive attributes to optimize communication. This language should be consistent so that the organization is identified through its single, stable voice. This language should be unambiguous.
The pursuit of this kind of writing excellence has led to the implementation of various disciplines designed to bring the authoring process under control. Yet authors of varied capabilities and backgrounds cannot comfortably be made to fit a uniform skill standard. Writing guidelines, rules and standards are elusive--difficult to define and enforce. Efforts aimed at both standardizing and improving on the quality of writing tend to meet with mixed results. However achieved and however successful, these results push up documentation authoring costs.
Recent attempts at surrounding authors with the software environment that might enhance their productivity and the quality of their writing have only succeeded in providing spell checkers. The effectiveness of other writing software has so far been disappointingly weak.
When the need to deliver information calls for the crossing of linguistic frontiers, the challenges multiply. The organization that needs to clear a channel for its information flow finds itself to a great extent, if not totally, dependent on translation.
Translation of text from one language to another language has been done for hundreds of years. Prior to the advent of computers, such translation was done completely manually by experts, called translators, who were fluent in the language of the original text (source text) and in the language of the translated text (target text). Typically, it was preferable for the translator to have originally learned the target language as his/her native tongue and subsequently have learned the source language. Such an approach was felt to result in the most accurate and efficient translation.
Even the most expert translator must take a considerable amount of time to translate a page of text. For example, it is estimated that an expert translator translating technical text from English to Japanese can only translate approximately 300 words (approximately one page) per hour. It can thus be seen that the amount of time and effort required to translate a document, particularly a technical one, is extensive.
The requirements for translation in business and commerce has grown steadily in the.last hundred years. This is due to several factors. One is the rapid increase in the text associated with conducting business internationally. Another is the large number of languages that such texts must be translated into in order for a company to engage in global commerce. A third is the rapid pace of commerce which has resulted in frequent revisions of text documents, which requires subsequent translation of new versions.
Many organizations have the responsibility for creating and distributing information in multiple languages. In the global marketplace, the manufacture must ensure that the manuals are widely available in the host languages of their target markets. Manual translation of documents into foreign languages is a costly, time-consuming, and inefficient process. Translations are usually inconsistent owing to the individual interpretation of the translators who are not necessarily well-versed in the application specific language used in the documentation. Because of these problems, fewer manuals than would be ideal are actually translated.
In the areas of research and development, the explosion of knowledge which has occurred in the last century has also geometrically increased the need for the translation of documents. No longer is there one predominant language for documents in a particular field of research and development. Typically, such research and development activities are taking place in several advanced industrialized countries, such as, for example, the United States, United Kingdom, Frcnce, Germany, and Japan. Many times there are additional languages containing important documents relating to the particular area of research and development Advances in technology, particularly in electronics and computers, have further accelerated the production of text in all languages.
The ability to produce text is directly proportional to the capability of the technology that is used. When documents had to be hand-written, for example, an author could only produce a certain number of words per unit of time. This increased significantly, however, with the advent of mechanical devices, such as typewriters, mimeograph machines, and printing presses. The advent of electronic, computer, and optical technology increased the capability of the author even further. Today, an average author can produce significantly more text in a given unit of time than any author could produce using the hand-written methods of the past.
This rapid increase in the amount of text, coupled with enormous advances in technology, has caused considerable attention to be paid to the subject of translation of text from its source language to a target language(s). Considerable research has been done in universities as well as in private and governmental laboratories, which has been devoted to trying to figure out how translation can be accomplished without the intervention of a human translator.
Computer-based systems have been devised which attempt to perform machine translation (MT). Such computer systems are programmed so as to attempt to automatically translate source text as an input into target text as an output. However, researchers have discovered that such computer systems for automatic machine translation are impossible to implement using present technology and theoretical understanding. No system exists today which can perform the machine translation of a source natural language to a target natural language without some type of editing by expert editors/translators. One method is discussed below.
In a process called pre-editing, source text is initially reviewed by a source editor. The task of the source editor is to make changes to the source text so as to bring it into conformance with what is known to be the optimal state for translation by the machine translation system. This conformance is learned by the source editor through trial and error.
The pre-editing process just described may go through iterations by additional source editors of increasing competence. The source text thus prepared is submitted for processing to the machine translation system. The output is target language text which, depending on the purposes of the translation of quality requirements of the user, may or may not be post-edited.
If the translation quality required must be comparable to that of proficient human translation, the output of machine translation will most likely have to be post-edited by a competent translator. This is due to the complexity of human language and the comparatively modest capabilities of the machine translation systems that can be built with present technology, within natural limitations of time and resources, and with a reasonable expectation of meeting cost-effectiveness requirements. Most of the modest systems that are built require, indeed, the post-editing activity, intended to approximate, by whatever measure, the quality levels of purely human translation.
Once such system is the KBMT-89 designed by the Center for Machine Translation, Carnegie Mellon University, which translates English to Japanese and Japanese to English. It operates with a knowledge based domain model which aids in interactive disambiguation (i.e., editing of the document to make it unambiguous). However, this interactive disambiguation is not typically done interactively with an author. Once the system finds an ambiguous sentence that it cannot disambiguate, it must stop the process and resolve ambiguities by asking a author/translator a series of multiple-hoice questions. In addition, since the KBMT-89 does not utilize a well-defined controlled input language the so-called translator assisted interactive disambiguation produces text which requires post-editing.
In view of the above, it would be advantageous to have a translation system that eliminates both pre- and post-editing.