The present invention relates to natural language translation, in particular, computer-implemented methods and apparatus for use in natural language translation of a source material in a source natural language into a target natural language.
Translation memories have been employed in the natural language translation industry for decades with a view to making use of previously translated text of high translation quality in current machine-assisted translation projects. Conventionally, translation memories leverage existing translations on the sentence or paragraph level. Due to the large granularity of a sentence or paragraph in a translation memory, the amount of re-use possible is limited due to the relatively low chance of a whole sentence or paragraph matching the source text.
One way to improve leverage of previous translations is through the use of a term base or multilingual dictionary which has been built up from previous translations over a period of time. The development and maintenance of such term bases require a lot of effort and in general requires the input of skilled terminologists. Recent advancements in the area of extraction technology can reduce the amount of human input required in the automatic extraction of term candidates from existing monolingual or bilingual resources. However, the human effort required in creating and maintaining such term bases can still be considerable.
A number of source code text editors include a feature for predicting a word or a phrase that the user wants to type in without the user actually typing the word or phrase completely. For example, some word processors, such as Microsoft Word™, use internal heuristics to suggest potential completions of a typed-in prefix in a single natural language.
US patent application no. 2006/0256139 describes a predictive text personal computer with a simplified computer keyboard for word and phrase auto-completion. The personal computer also offers machine translation capabilities, but no previously translated text is re-used.
There is therefore a need to improve the amount of re-use of previously translated text in machine-assisted translation projects, whilst reducing the amount of human input required.