In the field of computer language text information processing, machine translation is of greatest technical difficulty.
As mentioned in Section 1 “Machine Translation” in Chapter 8 in Computer-Based Natural Language Processing, which is written by Feng Zhiwei and published by Shanghai Foreign Language Education Press in October 1996, “Semantic barriers' encountered in machine translation which are pointed out in an American ALPAC report in 1964 still exist nowadays, and no breakthrough development has been achieved in the machine translation technology up to now.” In practice and commoditization, a machine translation machine is faced with a critical challenge.
In Where Is the Road for Machine Translation in Issue 2 of POPSOFT, 2004, Author Wang Shuo, after interviewing several experts in the industry with respect to MT (machine translation), pointed out, “Inherent problems of the machine translation technology are killers that hinder its development. Currently, no great breakthrough is achieved in China, and even in the whole world. In a short term, it is impossible to improve accuracy of translation by trying to use a machine with limited rules and corpora. Under a circumstance of an immature language intelligence research theory, MT software research encounters a technical bottleneck. It is impossible to solve a problem of selecting a sense of a word in different language contexts, and also impossible to correctly select a grammatical rule in varying complicated language contexts. Therefore, the translation quality cannot be improved remarkably.” That is also why current machine translation software cannot meet requirements of people and why results of such translation are always ridiculous . . . . TM (Translation Memory, translation memory) is designed in orientation to professional translators and organizations, and requires that a user should have independent translation capabilities. Its principle is that on a basis of a database, all translated materials are stored in the database in units of sentences. During translation, the machine automatically analyzes an electronic document, and may automatically replace sentences of 100% matches, and may provide translation suggestions for sentences of less than 100% matches according to a match extent, but new sentences completely depend on human translation. Finally, the author pointed out. “Inherent problems of the machine translation technology are killers that hinder its development. Currently, no great breakthrough is achieved in China, and even in the whole world.”
In Current Situation of Translation Memory Machine and Its Enlightenment in Issue 5 of Foreign Language Research, 2007, author Su Mingyang pointed out an inherent limitation of the translation memory technology. “A same sentence never requires retranslation.” However, “In reality, most translation activities lack repetitiveness, and a percentage of text repetitions exist only in some particular fields.
In Translation Memory Theory and Evaluation on Several Types of Computer-Aided Translation Software in Issue 2 of Journal of Hunan Medical University, March 2010, author Fu Yanfu reviewed and analyzed development of MT, and considered that translation quality in deed could not be satisfactory yet although it had been developed for over 70 years. “No wonder people consider artificial intelligence as one of ten difficult problems in human science and technology in the 21st century. In this case, a computer-aided translation machine based on the translation memory (MT) technology emerges.”
“Translation memory software generally provides translation tools such as translation memory, terminology database management, translation project management, corpus database processing and application, and so on.”
Corpus database processing is to perform sentence alignment on translated corpora and create a database by using sentence pairs after bilingual or multilingual semantic content is aligned, where the database is called “a sentence database” or a “memory database”.
CN200910002334.1 discloses a method for machine translation based on examples and phrases, that is the combination of examples and phrases. Although a translation granularity of the method is more appropriate than that based on characters and words and that based on sentences, a target language text is obtained temporarily by merely using an algorithm, and it is difficult to ensure quality without corrections by human brains. No database is created, and accumulation and long-term use are impossible.
The prior art has the following disadvantages: {circle around (1)} Translation quality of MT is poor. {circle around (2)} TM requires that a user should have independent translation capabilities.
The inventor of the present invention considers that the disadvantages of the prior art critically lie in simply allowing a computer to imitate human brains without thoroughly understanding language texts. It cannot be understood from an interlingual perspective that language texts are substantively ideographs. Different language texts cannot be associated according to ideographs, and further, no database can be created according to these associations for long-term use. The rule “ideographs of different language texts are implemented by using four types of common ideographic components” is not understood and utilized, and naturally, it is impossible to operate language texts in units of ideographic components in a computer or between networks, and therefore, it is difficult to overcome “semantic barriers”. Ideographic components cannot be used to create a database to support machine translation and other language text information processing applications.
In the field of computer language text information processing in the prior art, encoding is uniformly performed in orientation to characters, and texts are generated by using character codes. Storage, transmission, and even machine translation between different language texts in a computer, including machine translation supported by an electronic dictionary and a sentence database, are also based on characters. {circle around (3)} No ideographic association exists between different language texts in the prior art. {circle around (4)} In processing of characters, words, elements, and even multiple nodes in semantic content of a sentence, senses of words and semantic content are lost and cannot be drawn back.
In conclusion, the foregoing four disadvantages of the prior art have already become four technical problems of shackles.