This invention relates generally to natural language translation, and more particularly to manipulating linguistic structures that represent natural language expressions.
With the continuing growth of multinational business dealings where the global economy brings together business people of all nationalities and with the ease and frequency of today""s travel between countries, the demand for a machine-aided interpersonal communication system that provides accurate near real-time language translation is a compelling need. This system would relieve users of the need to possess specialized linguistic or translational knowledge.
A typical language translation system functions by using natural language processing. Natural language processing is generally concerned with the attempt to recognize a large pattern or sentence by decomposing it into small subpatterns according to linguistic rules. A natural language processing system uses considerable knowledge about the structure of the language, including what the words are, how words combine to form sentences, what the words mean, and how word meanings contribute to sentence meanings.
Morphological knowledge concerns how words are constructed from more basic units called morphemes. Syntactic knowledge concerns how words can be put together to form correct sentences and determines what structural role each word plays in the sentence and what phrases are subparts of what other phrases. Typical syntactic representations of language are based on the notion of context-free grammars, which represent sentence structure in terms of what phrases are subparts of other phrases. This syntactic information is often presented in a tree form. Semantic knowledge concerns what words mean and how these meanings combine in sentences to form sentence meanings. This is the study of context-independent meaningxe2x80x94the meaning a sentence has regardless of the context in which it is used.
Natural language processing systems further comprise interpretation processes that map from one representation to the other. For instance, the process that maps a sentence to its syntactic structure and/or logical form is called parsing, and it is performed by a component called a parser. The parser uses knowledge about word and word meaning, the lexicon, and a set of rules defining the legal structures, the grammar, in order to assign a syntactic structure and a logical form to an input sentence.
Formally, a context-free grammar of a language is a four-tuple comprising nonterminal vocabularies, terminal vocabularies, a finite set of production rules, and a starting symbol for all productions. The nonterminal and terminal vocabularies are disjunctive. The set of terminal symbols is called the vocabulary of the language.
The typical natural language processor, however, has realized only limited success because these processors are complex and require that the creator of the processor have extensive knowledge about the natural languages to be translated as well as having proficiency in the programming language used to implement the processor. Furthermore, typical natural language processors are dedicated to translating between two specific languages and in one direction only, making them ineffective when both languages need to be translated simultaneously, and rendering them useless when a translation into a different language is required.
The above-mentioned shortcomings, disadvantages and problems are addressed by the present invention, which will be understood by reading and studying the following specification.
Language-neutral methods for syntactic analysis, transfer, and morphological and syntactical generation of feature structures are utilized by a natural language translation system to translation an input expression in a source language into an output expression in a target language. The language-neutral methods are driven by language-specific grammars to translate between the specified languages so that no knowledge about the languages need be incorporated into modules that implement the methods. The modules interface with the grammar rules in the form of compiled statements in a grammar programming language (GPL) that perform the required manipulation of the feature structures.
Because the module are language-neutral, the creator of the modules only need know how to access the compiled grammar rules; no in-depth knowledge of the GPL or the natural language is required. Similarly, the creator of the natural language grammar rules only has to be proficient in the GPL and understand one or both of the natural languages to be translated. Furthermore, because the methods and modules are language-neutral, the system is readily adaptable to new languages simply by providing a grammar for the new language. Additionally, because each module and a corresponding grammar for a particular language can be encapsulated, multiple instances of the same module for different languages can execute simultaneously in the system, enabling it to simultaneously translate multiple languages.
The present invention describes systems, computers, methods, and computer-readable media of varying scope. In addition to the aspects and advantages of the present invention described in this summary, further aspects and advantages of the invention will become apparent by reference to the drawings and by reading the detailed description that follows.