1. Field of the Invention
The present invention relates to a communication support system, a communication support method, and a computer program for supporting communication among a large number of different languages and in particular to a communication support system, a communication support method, and a computer program for supporting communication among a large number of different languages using an interlingua system for first converting a source language into an intermediate language independent of a specific language and then converting the intermediate language into a target language.
More particularly, the present invention relates to a communication support system, a communication support method, and a computer program that can be used regardless of what the language is and can be constructed with a small number of steps and in particular to a communication support system, a communication support method, and a computer program for supporting communication among a large number of different languages by representing an intermediate language in a more understandable form.
2. Description of the Related Art
A language used by a human being for daily communicating with another person, such as Japanese or English, is called “natural language.” Natural language has an almost spontaneous origin and evolves with the history of human beings, the history of races, and the history of society; various natural languages exist at present. Of course, one person can communicate with another person by gesture, but the natural language enables most natural and advanced communication between the persons.
Natural language originally has an abstract and a highly ambiguous nature, but can be processed in a computer by handling the text mathematically. Consequently, various application services concerning the natural languages by automation processing, such as machine translation, an interactive system, and a search system, can be realized.
Among them, “machine translation” is a system for supporting communication between persons using different languages by making the most of computer processing.
The commercially practical machine translation systems at present are based on a system called direct machine translation system (henceforth “direct system”) or transfer-based machine translation system (henceforth “transfer system”).
Basically, the direct system simply replaces words of a source language with words of a target language. This is an effective system only when the grammar of the source language is similar to that of the target language as with Japanese-Korean translation.
On the other hand, the transfer system includes processing of replacing syntactic structures as well as replacing words. By way of example, the case where English sentence (1) is translated into Japanese sentence (2) in an English-Japanese translation system is considered.
(1) It is important to study English every day.
(2) Eigowo mainichi benkyousurunoha jyuuyouda.
(*Japanese Sentence)
The syntactic structure in (1) is largely different from the syntactic structure in (2). Thus, the transfer system performs the steps of first converting (1) into an English sentence (3), which can be easily translated into Japanese, and then converting the English sentence (3) into Japanese.
(3) To study English every day is important.
That is, the transfer system requires a conversion rule of converting a source language sentence into “a source language sentence easily translated into a target language sentence” (within the same language) and also requires a conversion rule of converting the syntactic structure of source language sentence into the syntactic structure of target language sentence. For both conversion rules, if either of the target language and the source language differs, entirely different conversion rules are required. Of course, like the direct system, the transfer system requires a word dictionary to convert words of a source language into words of a target language.
As a third machine translation system, a technique called interlingua-based machine translation system (henceforth “interlingua system”) has been proposed. In the interlingua system, a source language is converted into an intermediate (interlingua) language independent of a specific language and then the intermediate language is converted into a target language. As an example of the intermediate language, there is a structure called f(unctional)-structure obtained as a result of syntactic analysis based on a grammar theory called Lexical Functional Grammar (LFG).
In LFG, the language knowledge of a native speaker, namely grammar, is formed as a component separated from other non-grammatical processing parameters affecting computer processing and computer processing operation. Details of LFG are described, for example, in paper “Lexical-Functional Grammar: A Formal System for Grammatical Representation” in collaboration with R. M. Kaplan and J. Bresnan (The MIT Press, Cambridge (1982). Reprinted in Formal Issues in Lexical-Functional Grammar, pp. 29-130. CSLI publications, Stanford University (1995).) The f-structure represents grammatical functions clearly and is made up of grammatical function names, semantics formats, and feature symbols. The f-structure is referenced, whereby semantic understanding of subject, object, complement, and adjunct can be obtained.
A machine translation system using the f-structure as an intermediate language is described in detail in document “Frank, A., “From Parallel Grammar Development towards Machine Translation.” In Proceedings of MT Summit VII, “MT in the Great Translation Era,” Singapore, pp. 134-142. (1999).” As a document giving a general description of the three systems, “Hozumi TANAKA, “Natural Language Processing and Its Application” The Institute of Electronics, Information and Communication Engineers (1999)” can be named.
Supporting communication among a large number of different languages by a machine translation system is considered hereinafter.
The direct system requires that a word dictionary to convert words of a source language into words of a target language be provided for each of combinations of source languages and target languages. Likewise, the transfer system also requires that a syntax dictionary (a conversion rule set to convert syntax of source language sentence S into syntax of source language sentence S′, which is easily converted into a target language sentence, and a conversion rule set to convert S′ into syntax of a target language sentence S″be provided for each of combinations of source languages and target languages in addition to the word dictionary.
Thus, to support communication among n different languages, n2-n dictionaries (machine translation systems) must be constructed. For example, to support communication among 10 different languages, 90 (=10P2) systems need to be constructed (see FIG. 2).
However, simply constructing a single word/syntax dictionary (translation system) requires an enormous number of steps. Therefore, it is an extremely difficult job to construct dictionaries or translation systems to deal with all language combinations.
If such a dictionary construction job is applied between languages each having a large language population such as English, Chinese, German, French, and Japanese, the difficulty is relatively because there is a rich language resource of word dictionaries for translation and the like. However, when either or both of a source language and a target language are languages each having a small language population, the scholar resources involved in language processing are inevitably poor and thus it is practically impossible to construct a word dictionary or a syntax dictionary to implement a translation system. Therefore, if the transfer system or the direct system is adopted, it is extremely difficult to construct a support system for a person using a language having a small language population to communication with a person using any other language.
Even if the interlingua system is adopted, the conversion system between the intermediate language and each language depends on the language and therefore if it is necessary to support communication among n different languages, 2n entirely different conversion systems (machine translation systems) need to be constructed. For example, to support communication among 10 different languages, 20 translation systems need to be constructed (see FIG. 3). It is also practically impossible to construct such a system for a language having a small language population as with the transfer system and the direct system.