To With the continuing growth of multinational business dealings where the global economy brings together business people of all nationalities and with the ease and frequency of today's travel between countries, the demand for a machine-aided interpersonal communication system that provides accurate near real-time language translation is a compelling need. This system would relieve users of the need to possess specialized linguistic or translation knowledge.
A typical language translation system functions by using natural language processing. Natural language processing is generally concerned with the attempt to recognize a large pattern or sentence by decomposing it into small sub-patterns according to linguistic rules. A natural language processing system uses considerable knowledge about the structure of the language, including what the words are, how words combine to form sentences, what the words mean, and how word meanings contribute to sentence meanings.
Morphological knowledge concerns how words are constructed from more basic units called morphemes. Syntactic knowledge concerns how words can be put together to form correct sentences and determines what structural role each word plays in the sentence and what phrases are sub-parts of what other phrases. Typical syntactic representations of language are based on the notion of context-free grammars, which represent sentence structure in terms of what phrases are sub-parts of other phrases. This syntactic information is often presented in a tree form. Typically, semantic knowledge concerns what words mean and how these meanings combine in sentences to form sentence meanings. This is the study of context-independent meaning--the meaning a sentence has regardless of the context in which it is used. The representation of the context-independent meaning of a sentence is called its logical form. The logical form encodes possible word senses and identifies the semantic relationships between the words and phrases.
Natural language processing systems further include interpretation processes that map from one representation to the other. For instance, the process that maps a sentence to its syntactic structure and logical form is called parsing, and it is performed by a component called a parser. The parser uses knowledge about word and word meaning, the lexicon, and a set of rules defining the legal structures, the grammar, in order to assign a syntactic structure and a logical form to an input sentence.
Formally, a context-free grammar of a language is a four-tuple comprising nonterminal vocabularies, terminal vocabularies, a finite set of production rules, and a starting symbol for all productions. The nonterminal and terminal vocabularies are disjoint. The set of terminal symbols is called the vocabulary of the language.
A natural language processor receives an input sentence in a source language, lexically separates the words in the sentence, syntactically determines the types of words, semantically understands the words, and creates an output sentence in a target language that contains the content of the input sentence. The natural language processor employs many types of knowledge and stores different types of knowledge in different knowledge structures that separate the knowledge into organized types.
In transferring a linguistic representation of a source language (such as English) to the linguistic representation for a target language (such as Japanese), a significant amount of linguistic knowledge needs to be incorporated in order to achieve a high-quality translation. In a prior method for transferring, an transfer-driven method was developed. In this method, transfer rules that operated at a string, pattern, or semantic grammar rule level. The input sentence was analyzed using the transfer rules, and the rules that developed the best analyses were used to generate the target-language output. In addition, example expressions were used to annotate the transfer rules directly.
In another prior method, a dependency tree representation was used to store examples of the source linguistic structures. During transfer, this method selected a set of example fragments that completely coverts the input. The target-language expression was then constructed from the target-language portions of the selected fragments. The dependency trees created are not detailed enough to account for many natural language expressions. This method also requires exact matches between the input and examples. Because of the variability of natural languages, exact matches are hard to achieve or require extremely large databases of examples.
What is required is a method and system that incorporates the ease and accuracy of the example-based method with the ability to manipulate the transfer rules to allow for a variety of attempts at translation.