This invention is related to a language analysis system and a method of machine assisted language.
The machine translation that computer translates a natural language from the past is studied. Then, a part is made practical use. In the machine translation, it earlier analyzes the original language to translate. After that, it translates an original language into the other language. Therefore, the language analysis is the important step of the machine translation. The correctness of this language analysis fixes the correctness of the translation. Also, it doesn""t use a language analysis only for the machine translation. It is widely used for other language processing in general.
In the conventional language analysis, when one word has equal to or more than 2 parts of speech, it is difficult to specify this correctly. For example, it is the way of the Japanese patent TOKU KAI HEI 4-305769 number bulletin. At this case, the work person chooses a part of speech. Of course, it is necessary that the person operates a judgement. Computer can not specify a part of speech. Also, it is the way of the Japanese patent TOKU KAI HEI 5-290081 number bulletin. It is the way of memorizing the use frequency of the part of speech beforehand every object field of the document and improving the specific correct degree of the part of speech. Correctness""s improvement of some degree is realized when using this way. However, the management of the dictionary is difficult. The problem to be difficult occurs. Moreover, there is a limit in correctness""s improvement, too.
The big problem of machine translation is in the others, too. Because if the word of the original language and the equivalent do not correspond one-to-one, the correct translation becomes difficult. Therefore, the way of mentioning the combination of the word to the equivalent dictionary is adopted. However, too, many examples about the natural language occur. As the fact, it wasn""t possible to make a dictionary. Also, even if such a dictionary is supposed to have been able to be realized, the dictionary capacity is too big and the reference processing becomes too late. As the result, the practical use can not be secured.
By the way, in case of language analysis, it doesn""t only analyze the part of speech and the attribute that each token has peculiarly. To analyze the structure and the role of the sentences is important. In the past, the system that analyzes relation between the token and the other token is proposed. However, there was not a system that analyzes the role of the token in the relation of sentence structure.
As for the analysis of sentence structure, there is a way of the Japanese patent TOKU KAI SHOU 62-262177 number bulletin. Here, the analysis technique to extract an insertion phrase from the sentence is elucidated. However, in this technique, the example of the insertion phrase must be memorized in detail. All examples must be memorized in making practical use. Therefore, the realization is difficult.
Next, it is the way of the Japanese patent TOKU KAI SHOU 64-17152 number bulletin. It elucidates the way of analyzing the relation of the sentences using the meaning category number (the common meaning concept which the word has). In this way, the special meaning category number must be used and the system was complicated. Also, the relation can not be fixed as one.
It is demanded that a translation system among the different languages is realized. For example, the system which used an artificial international word is elucidated to the U.S. Pat. No. 5,426,583 number. However, this system doesn""t have the concrete. Under present condition, the realization is difficult. Of course, there is not a proposition that solved each above-mentioned problem point.
This invention provides a language analysis system that solves each above-mentioned problems. Also, it provides the system that solved each problem point respectively.
Hereinafter, the concept of the terminology that explains this invention is explained.
xe2x80x9cLanguagexe2x80x9d: It is the concept to mean natural languages such as the writing word and the talking word of the document and the text and so on. It may be the forms of which such as the letter code form, the image form and the sound form.
xe2x80x9cProgram which computer executesxe2x80x9d: The case to execute after once changed (the thawing of compression and so on). The case to execute in the combination of the other module. It contains two cases.
A language analysis system and a way of analyzing a language in case of this invention are described below.
It divides a given language into a token. At the same time, it acquires a part of speech about each token from a dictionary.
The case that one token has equal to or more than 2 parts of speech. It refers to the part of speech of 1 or more than one token that is situated on before, later or both. Then, it chooses one part of speech from equal to or more than 2 parts of speech that were given to the concerned token.
The case which the part of speech of the token is the root of the predicate. Based on the suffix of the predicate, it fixes the grammatical attribute of the concerned predicate.
The concerned role of equal to or more than 1 token which has a role and the part of speech of the token which is equal to or more than 1 which a role isn""t given to. It fixes the role of the token which is equal to or more than 1 which a concerned role isn""t given to by these two. At the same time, it makes both be related.
It extracts each one subordinate sentence and one main sentence. Then, it makes relate to the other part. It makes the xe2x80x9csubjectxe2x80x9d xe2x80x9cpredicatexe2x80x9d be related to the main sentence and each one subordinate sentence (the whole analysis). This is executed in the appearance position and the number of times of the xe2x80x9csubjectxe2x80x9d xe2x80x9cpredicatexe2x80x9d. The subordinate relation of the subordinate sentence becomes clear with the analysis. If there is a mistake in the earlier going analysis, it corrects the analysis.
In above-mentioned way, it fixes the part of speech of each token correctly. Then, it combines these tokens in quasi word which has one role. In this way, it analyzes a part of speech by the token. Then, it analyzes the role of token or quasi word by this part of speech.
Also, it makes the analysis of token or quasi word and the part of speech of the un-fixed token correspond. Then, it analyzes the role and the structure of un-fixed token (the local analysis). By the result of the whole analysis, it corrects a local analysis. Therefore, the structure and the role of the sentence can be more correctly analyzed.
In this language analysis system and the way of analyzing a language, it extracts each one subordinate sentence and one main sentence. Then, it analyzes a sentence pattern about each. Therefore, to make a sentence pattern a type is easy and the correct analysis becomes possible.
It is separating the local analysis and the whole analysis in the language analysis system and the method of this invention of analyzing a language. Therefore, processing is simple-ized and a correct analysis is realized.
The language analysis system of this invention does the whole analysis after doing local analysis. After that, it corrects a local analysis according to the necessity. Therefore, even if it is a complicated sentence, it is possible to analyze correctly.