This invention relates generally to natural language translation, and more particularly to compiling grammars used to translate a natural language.
With the continuing growth of multinational business dealings where the global economy brings together business people of all nationalities and with the ease and frequency of today""s travel between countries, the demand for a machine-aided interpersonal communication system that provides accurate near real-time language translation is a compelling need. This system would relieve users of the need to possess specialized linguistic or translation knowledge.
A typical language translation system functions by using natural language processing. Natural language processing is generally concerned with the attempt to recognize a large pattern or sentence by decomposing it into small sub-patterns according to linguistic rules. A natural language processing system uses considerable knowledge about the structure of the language, including what the words are, how words combine to form sentences, what the words mean, and how word meanings contribute to sentence meanings.
Morphological knowledge concerns how words are constructed from more basic units called morphemes. Syntactic knowledge concerns how words can be put together to form correct sentences and determines what structural role each word plays in the sentence and what phrases are subparts of what other phrases. Typical syntactic representations of language are based on the notion of context-free grammars, which represent sentence structure in terms of what phrases are subparts of other phrases. This syntactic information is often presented in a tree form. Semantic knowledge concerns what words mean and how these meanings combine in sentences to form sentence meanings. This is the study of context-independent meaningxe2x80x94the meaning a sentence has regardless of the context in which it is used.
Natural language processing systems further include interpretation processes that map from one representation to the other. For instance, the process that maps a sentence to its syntactic structure and/or logical form is called parsing, and it is performed by a component called a parser. The parser uses knowledge about word and word meaning, the lexicon, and a set of rules defining the legal structures, the grammar, in order to assign a syntactic structure and a logical form to an input sentence.
Formally, a context-free grammar of a language is a four-tuple containing nonterminal vocabularies, terminal vocabularies, a finite set of production rules, and a starting symbol for all productions. The nonterminal and terminal vocabularies are disjunctive. The set of terminal symbols is called the vocabulary of the language.
The typical natural language processor, however, has realized only limited success because these processors require complex operations to manipulate the representations of the expressions. The creation of such operations using existing methodologies is tedious and the inflexibility of such methodologies limits the kinds of operations that can be used, resulting in inefficiencies in the translation process.
The above-mentioned shortcomings, disadvantages and problems are addressed by the present invention, which will be understood by reading and studying the following specification.
A grammar programming language (xe2x80x9cGPLxe2x80x9d) compiler enables a flexible programming language for creating natural language grammars by it hiding much of the complexities of manipulating representations of natural language expressions. Each rule in a natural language grammar is compiled by the GPL into a separate function that can be invoked by a translation system to apply the rule to the representation. Furthermore, the GPL compiler can output the functions for the rules as source code for a standard computer programming language to be further compiled into object code that can be directly executed by a computer processor, thus increasing the speed of translating a natural language.
In one aspect, the GPL compiler generates expansion and combination functions for each rule so that the programmer of the natural language grammar does not have to be concerned with determining when a rule can be applied to the representation of an ambiguous expression, making the task of creating a grammar simpler. Furthermore, the GPL compiler creates expansion and combination functions to more efficiently and quickly perform the translation of such ambiguous expressions.
In another aspect, the GPL compiler handles nested GPL statements, allowing the programmer to easily define multi-layered operations to be carried out on the representations.
The present invention describes systems, clients, servers, methods, and computer-readable media of varying scope. In addition to the aspects and advantages of the present invention described in this summary, further aspects and advantages of the invention will become apparent by reference to the drawings and by reading the detailed description that follows.