1. Field of the Invention
This invention relates to a Chinese generation apparatus for machine translation, which utilizes statistic data instead of a large number of semantic and syntactic rules.
2. Description of the Related Art
In the 20th-century, people have to keep learning to prevent themselves from getting disconnected from the society. However, for most of the new knowledge are from foreign countries, document translation is important in the efficiency of foreign document reading. In order to improve the quality and efficiency of document translation, a recent trend is to use computer instead of human for a translation job. Such translation apparatus is commonly called a machine translation apparatus. In such a machine translation apparatus, the input language that is to be translated is known as the source language, while the output language that has been translated from the input language is known as the object language. For example, the source language of a Japanese-to-Chinese machine translation apparatus is Japanese, while the object language of the same is Chinese. Furthermore, the translation format used in the machine translation apparatus may be the direct form, the intermediate converting form or the pivot form, depending on the characteristic of the language to be translated. Generally, the intermediate converting form is the one which is commonly used.
Referring to FIG. 8, a conventional machine translation apparatus employing the intermediate converting form includes a source language parsing unit 1, an intermediate structure converting unit 2, an object language generating unit 3 and a dictionary unit 4. However, the quality of machine translation depends on whether the input sentence is correctly parsed in the source language parsing unit 1, whether the difference between the source language and the object language is eliminated in the intermediate structure converting unit 2 (e.g. solve the difference in syntax or meaning, or the selection of the lexicon item translation), and whether an object language is correctly generated in the object language generating unit 3 in accordance with the syntactic rule of the object language.
However, a Chinese sentence will have different meanings as the locations of the lexicons in the sentence are changed. For example, in the sentences, [{character pullout}] (He is jumping on a table.) and [{character pullout}] (He jumps onto a table.), since the location of "{character pullout}" (on a table) in the former sentence differs from that in the latter sentence, the two sentences have different meanings. Therefore, the arrangement of some lexicons in a Chinese sentence has a given sequence, and unless it is so, an incorrect Chinese sentence may be generated. The following is an example, wherein the time lexicon must be placed before the location lexicon. (correct Chinese sentence) {character pullout}{character pullout}. (Literally: He*yesdaty**at school*ate) (He ate dinner at school yesterday.) (incorrect Chinese sentence) {character pullout}{character pullout}. (Literally: He*at school*yesdaty*ate)
On the other hand, the sequence of some lexicons in Chinese sentence is unrestrained. The following is an example, wherein the time lexicon may be placed before or after the subject. (the time lexicon is placed before the subject) {character pullout}. (Yesterday he went to school.) (the time lexicon is placed after the subject) {character pullout}. (He went to school yesterday.)
Therefore, if the object language of a machine translation apparatus is Chinese, the most important problem to be solved is how to correctly determine the arrangement sequence of the lexicons in a Chinese sentence. Referring to FIG. 9, R.O.C. Pat. Publication No. 324804 discloses a Chinese generation apparatus for machine translation.
A preprocessing unit 200 of the Chinese generation apparatus in FIG. 9 recovers the subject node for the sub-structure which omits the subject in the Chinese sentence dependency structure that is input as shown in FIG. 10A and that is an intermediate structure with the use of dummy node. Next, a basic item spreading unit 300 generates a basic sentence structure including basic item as shown in FIG. 10B in accordance with the basic sentence pattern stored in the basic sentence pattern memory unit 350 with the use of the verb classification code of the main item (verb or adjective) of each of the sub-structure as the searching key.
An unrestrained item spreading unit 400 retrieves the surface case marker of the phrase head, the surface case marker of the phrase tail and the sentence item slot in accordance with the sentence item information memory unit 450 with the use of the case marker of each unrestrained item, the surface case marker of the source language, the semantic dominating code and the semantic code of itself in the dependency structure as the searching key, and generates the sentence structure of each unrestrained item of FIG. 10C in accordance with the corresponding location of the sentence item slot location in the sentence structure.
A special sentence pattern generation unit 500 generates the special sentence pattern sentence structure of FIG. 10D in accordance with the special sentence pattern attribute of each verb or adjective. As shown in FIG. 10E, an item location adjusting unit 600 orderly retrieves the item arrangement sequence limitation in each sentence item slot from the sentence formation item sequence memory unit 650 and adjusts the item arrangement sequence in each sentence item slot in the sentence structure. Afterwards, a post processing unit 700 performs the generation of other accessory item and punctuation on the sentence structure and lines the sentence structure. An output unit 800 outputs the translation result "{character pullout}{character pullout}" (I put the book in the car.). A buffer unit 900 is used for temporarily storing the output from the basic item spreading unit 300, the unrestrained item spreading unit 400 and the item location adjusting unit 600.
The drawbacks that are associated with the aforementioned conventional Chinese generation apparatus for machine translation are as follows:
1. The verb or adjective of Chinese language has a plurality of Chinese basic sentence patterns that are possibly generated, for example, the verb "{character pullout}" may be generated into the basic sentence patterns as follows. (Wherein S represents subject, V represents verb, O represents direct object or indirect object, and C represents complement) PA1 SVOO: {character pullout}. (I gave a book to him.) PA1 SVOOC: {character pullout}{character pullout}. (I gave him a book as a souvenir.) PA1 SVOC: {character pullout}. (I saw him home.) PA1 SVO: {character pullout}. (He will get killed.) PA1 2. The translation quality cannot be improved since the location of the unrestrained item is assigned in accordance with the content of the sentence item information memory unit but not in accordance with the associated item state. For example, if the location of the time lexicon "{character pullout}" (today) is assigned to 2, the conventional Chinese generation apparatus only can generate the sentence "{character pullout}."(I graduated today.), but cannot generate the sentence "{character pullout}" (Today I graduated.) that emphases "{character pullout}" (today). PA1 3. Since the adjustment of the relative location among the unrestrained item in the same slot is related with the content of the sentence item sequence memory unit, a strange or incorrect Chinese sentence may be generated when the content of the sentence item sequence memory unit is incomplete. PA1 a statistic information memory unit for storing the argument item of the dependency structure of the Chinese sentence, the possible sentence pattern, the possible case marker arrangement of each slot and the corresponding probability value; PA1 an accessory item information memory unit for storing the case marker, the source language surface case marker, the argument semantic code, the semantic code of the modifier and the corresponding phrase head surface case marker and phrase tail surface case marker; PA1 an accessory item generating unit for retrieving the case marker of the leaf node item, the source language surface case marker, the argument semantic code and the corresponding node item semantic code as searching key from the Chinese phrase structure, for retrieving the phrase head surface case marker and the phrase tail surface case marker from the accessory item information memory unit in accordance with the searching key, and for orderly generating the preposition structure for the Chinese phrase structure; and PA1 a post processing unit for retrieving each clause structure from the Chinese phrase structure, for generating the question sentence or the "{character pullout}" ("ba") sentence or the negative sentence or the passive sentence or the imperative sentence and the corresponding tense marker and punctuation, and for converting the Chinese phrase structure into the Chinese sentence with the use of the lining approach.
Therefore, the conventional Chinese generation apparatus for machine translation cannot solve the problem of the difference in the basic sentence patterns by the verb classification code. This problem must be solved by heuristic method and thus, the translation quality cannot be ensured.