Since the time when improvements in transportation began to significantly reduce the inconvenience and cost of cross-border travel, the desirability of universal communication has been recognized. In the 1960s, for example, international efforts were made to promote Esperanto as a universal language. While that effort ultimately failed, the large number of fluent speakers--between 1 and 15 million worldwide--and the scope of the efforts illustrate the problem's importance. Esperanto did not succeed because it required acquisition of both a new grammar and a new vocabulary, the latter posing a far greater challenge for would-be speakers.
The improving ease and speed with which information can now be transmitted worldwide has augmented the need for universal communication. Current efforts have focused most heavily on automated translation among languages. Systems now in use generally store, in a source and a target language, millions of frequently used words, phrases and combinations, relying for accuracy and robustness on the occurrences in the text to be translated. Such systems are by definition incomplete, since no system can possibly store every possible word combination, and their usefulness varies with the linguistic idiosyncracies of their designers and users. It is almost always necessary for a human to check and modify the output translation. These systems also translate one word at a time (and so operate slowly), and require a separate database unique to each target language. Moreover, because they are programmed to recognize distinctive language characteristics and their unique mappings from one language to another, each translation must be done individually. In other words, the time required for multiple translations is the sum of the times for each translation performed individually.
Translation is difficult for numerous reasons, including the lack of one-to-one word correspondences among languages, the existence in every language of homonyms, and the fact that natural grammars are idiosyncratic; they do not conform to an exact set of rules that would facilitate direct, word-to-word substitution. It is toward a computational "understanding" of these idiosyncracies that many artificial-intelligence research efforts have been directed, and their limited success testifies to the complexity of the problem.
U.S. Pat. No. 5,884,247 (issued Mar. 16, 1999) describes an approach toward language translation in which natural-language sentences are represented in accordance with a constrained grammar and vocabulary structured to permit direct substitution of linguistic units in one language for corresponding linguistic units in another language. The vocabulary may be represented in a series of physically or logically distinct databases, each containing entries representing a form class as defined in the grammar. Translation involves direct lookup between the entries of a reference sentence and the corresponding entries in one or more target languages.
In accordance with the '247 patent, sentences may be composed of "linguistic units," each of which may be one or a few words, from the allowed form classes. The list of all allowed entries in all classes represents the global lexicon, and to construct an allowed sentence, entries from the form classes are combined according to fixed expansion rules.
Sentences in accordance with the '247 patent are constructed from terms in the lexicon according to four expansion rules. In essence, the expansion rules serve as generic blueprints according to which allowed sentences may be assembled from the building blocks of the lexicon. These few rules are capable of generating a limitless number of sentence structures. This is advantageous in that the more sentence structures that are allowed, the more precise will be the meaning that can be conveyed within the constrained grammar. On the other hand, this approach renders computationally difficult the task of checking user entries in real time for conformance to the constrained grammar.