1. Technical Field
This invention relates to techniques in machine translation. More particularly, the invention relates to a method for decomposing prose elements in document processing.
2. Description of the Prior Art
As a research in linguistics has reported, length of sentence is of significance in reading comprehension. For example, when reading New York Times articles, the long sentences in the Editorial often cause havoc in processing. Human consciousness in processing language is just like a bird in flight—for a bird the more branches to have to perch on, the farther it can fly, and for human mind the more proper punctuation, the easier it moves on. For example, a five-word segment is easier to process than a ten-word segment, and simple sentences are easier to understand than complex sentences.
As being taught in grammar school, a complex sentence typically consists of main clause, co-ordinate clause(s), participle clause(s) and subordinate clause(s) in a number of combinations. Later in life, human being carefully adapts to parsing complex sentences. It would be advantageous that this internal process for decomposing complex sentences can be articulated so that it may be applied to machine translation techniques.
As the current state of art in machine translation, there is no capability to decompose complex sentences into simple segments which can be easily understood by the reader. The lack of this capability typically results in the translations that are virtually undecipherable. FIG. 1 shows a search result from an Internet search query in English 10 and some machine translations 20 and 30 of the search result. The English version 10 of search result reads as:                The Paris MOU consists of 18 participating maritime Administrations and covers the waters of the European coastal States and the North Atlantic basin from North America to Europe. The Paris MOU aims at eliminating the operating of sub-standard ships through a harmonized system of port State control.        
This search result consists of a description of Port State Control Paris MOU. As a Chinese reader can see, the machine-translated Chinese sentences 20 and 30 are virtually undecipherable because Chinese words all run into each other with no break. These translations lack the ability to segment the present participle clauses into understandable Chinese modules.
Researches in linguistics find that English and a number of other hypertactic languages are adorned with rich cohesive ties. For example, the follow sentences all have similar meanings but with different cohesive ties:                When the baby cried, the mother picked it up.        If the baby cried, the mother picked up.        Upon hearing the baby cried, the mother picked it up.        Judging from the fact that the baby cried, the mother picked it up.        
At the same time, Chinese and other paratactic language emphasize on oral tradition and/or narratives in abundance, or they are highly developed prior to printing technology. Speakers in these paratactic languages can figure out just from the following two simple sentences:                The baby cried. The mother picked it up.        
It is therefore desired a technology in machine translation which can pre-process complex sentences into manageable segments for ease of human understanding, preferably such technology should decompose complex sentences into simple sentences.
It is further desired that such technology is able to identify, isolate and strip out cohesive ties in comparatively more hypertactic language for the benefit of people more accustomed to paratactic language.
It is further desired that such technology can be applied to machine translation so that the comparatively independent linguistic components such as clauses and phrases can be translated to a second language, and the translated results in the second language can be easily understand by the speakers of the second language.