Within the field of computational linguistics whereby computer software is used to translate from one language to another it is known to provide translation Memories (TM). These TMs are databases that store translated segments. The segments may be sentences or phrases that have been previously translated by human translators. The TM databases store the words, sentences, paragraphs and phrases that have already been translated which may be accessed by human translators to aid translation. TMs are typically used to assist translators and post-editors in a Computer Assisted Translation (CAT) environment by returning the most similar translated segments to avoid duplication of work in translation. The TM stores the source text and its corresponding translation (target text) in language pairs known as “translation units”.
Another technique used in a Computer Assisted Translation (CAT) environment makes use of statistical machine translation (SMT) which is a machine translation method where translations are generated on the basis of statistical models whose parameters are derived from the analysis of bilingual text corpora. In linguistics, a corpora is a large and structured set of texts. The corpora which may be electronically stored and processed facilitates statistical analysis and hypothesis testing such as checking occurrences or validating linguistic rules. With the rapid development in SMT, machine translation (MT) systems are beginning to generate acceptable translations, especially in domains where abundant parallel corpora exist.
However advances in SMT are being adopted only slowly and sometimes somewhat reluctantly in professional localization and post-editing environments because of the usefulness of the TM, the investment and effort the company has put into TMs, and the lack of robust SMT confidence estimation measures which are as reliable as fuzzy match scores. Currently the localization industry relies on TM fuzzy match scores to obtain both a good approximation of post-editing effort and an estimation of the overall translation cost.
There is therefore a need for a translation system which addresses at least some of the drawbacks of the prior art.