Language translation is a kind of science, and a kind of art. Since there are many kinds of languages and each language has many changes and variations beside the rules, a lot of work of brain and inventiveness is needed for translating a language into another language. Since 1930s, the machine translation had been proposed. Along with the development of the computer technology, different kinds of computer translation systems and technology, such as ED (Electronic Dictionary), MT (Machine translation), TM (Translation Memory), IT (Interactive Translation) and CAT (Computer Aided Translation), are developed.
Those systems perform language translation in different ways. For example, the Electronic Dictionary can only translate a word, or look up a word in a dictionary.
Traditional MT technology performs language translation according to the grammar rules, wherein the grammar rules are summarized by the language specialists, and written into the translation program by the programmer, and amended only by the programmer. Such grammar rules can not cover all language phenomenons due to the richness and flexibility of language. Therefore, the translation quality of traditional MT technology is lower, especially for translation of a long sentence and a complicated sentence.
With rapid increase in the arithmetic speed of CPU and the storage capacity of record medium, the statistical machine translation technology (SMT) and translation memory (TM) technology have been proposed. The basic idea is storing vast amount of bilingual sentence-pair, and getting target text of input source sentence by extracting translated and stored portions from the bilingual sentence-pair. Translation memory technology is the right direction of high quality translation of the computer translation technology.
FIG. 1A shows the translation scheme of the traditional TM translation technology. Therein, the TM translation module compares the input source sentence with the source part of the bilingual sentence-pair of the corpus (matching processing). If fully matched or preset match factor is satisfied, the target part of the bilingual sentence-pair is output as TM translation result.
FIG. 1B shows an example of a sentence-pair recorded by the traditional sentence-pair record method. In the example, the source text is recorded in left part, and the target text is recorded in the right part, and there is a separator between the source text and target text. Therein, both of the source text and the target text comprise only simple content of text, such as words, characters, punctuation marks, etc., in corresponding languages. Therein, there is no information aimed to translation, in spite of the separator between the source text and the target text. Therefore, the effect of this kind of the sentence-pair is very limited. That is to say, the accurate translation result for a similar source sentence cannot be obtained from the sentence-pair, although the accurate translation result only for a same source sentence can be obtained from the sentence-pair.
Therefore, all of possible sentences, and in addition to the translated sentence-pairs, should be accumulated for the traditional TM technology to get an accurate translation result for a sentence. However, it is almost impossible to accumulate all sentence-pairs of a pair of languages due to the richness and flexibility, and arbitrariness of sentences written by different authors. This is to say that the amount of sentence/sentence-pair is unlimited or immeasurable. Practically, we had accumulated hundreds of thousands of sentence-pair in a professional field, with large cost of human power and money; however, there are only a few thousandths of coverage (repetition rate) during a test of translation. Therefore, the TM translation technology has a big obstacle. Thereby, the benefit of the traditional MT technology is recalled, that is to cover more sentence by less grammar rules or sentence templates. As a result, the MT technology is conjugated with the TM technology, so as to form a strategy of hybrid computer translation.
The inventor of this patent application has developed a computer intelligent translation system, which uses the technology of recording and storing the intellectualized sentence-pair, to improve the efficiency and coverage rate of the bilingual sentence-pair by means of artificial intelligence. For more information, please refer to the web site: www.aitrans.net.
In recent years, other modified TM technology has arisen, for example, the sentence templates are used in the TM solution, and it aims to cover more sentences by a sentence template stored in the base of sentence templates. The technology of a sentence template is that a translation example is abstracted into a sentence template which only reserves the syntax word and insert special symptom for fill-up, the syntactical analysis is performed on the input sentence and a syntax tree is created, then obtained the target text for the input sentence by comparing the syntax tree with the sentence template. This method actually goes back toward the traditional MT technology, because abstracting the translation sentence-pair into a syntax sentence template is a hard work which need much time and human labour, and can't be performed automatically. On the other hand, the uniqueness of sentence is lost and the accuracy of the translation of some special sentence is declined, although the coverage factor i.e. versatility of translation is improved by means of the sentence template. Due to the imperfect and no useful means for accumulating sentence templates, no practical example of this technology can be seen by now.