The speed and facility of international communication has greatly increased in past decades, but the content of that communication is still encoded in language forms that make access to that data difficult for a vast majority of the world's population. There has long been an identified need for translation systems that would allow easier access to digitized information.
Much of the world's communication has recently come to depend on the use of the English language, and many of those using English are by no means native speakers. This presents several problems. Among these are:
1) It requires a great amount of time, and thus investment of resources, to teach a person to use English effectively. PA1 2) English has various dialects and national forms. PA1 3) When persons from diverse linguistics cultures who know English only as a second language try to communicate, serious problems often occur.
There is no language spoken by more than a small minority of the world population. Mandarin Chinese, the most widely spoken, is limited in geographic distribution and by a complex written form. Projections vary on the growth of language communities, but it appears that several languages are growing at rates more rapid than English. Thus the linguistic dimension of international communication is likely to remain a barrier--even as mechanical means find solutions to the physical obstacles.
Many aspects of worldwide communication are being rapidly expanded by new technologies, while other aspects lag far behind. The bulk of material in digital form is growing and the use of optical character recognition (OCR) systems and methods of scanning handwriting are making digitalization easier. Much digitalization is however still done by keyboard and with a QWERTY keyboard layout, an arrangement which was intentionally designed to be slow so that mechanical typewriter keys would not stick. The need to ease and to speed the input of digital information has been dramatized by the development of the Internet.
The quality of human to machine interface is becoming an important consideration in many fields. The need for error-free data exchange has become urgent, for mistakes can mean the loss of lives. The technology for "text to voice" operations is rapidly developing, but quality output is prevented by linguistics systems that do not allow an exact correspondence between the two.
The field of Machine Translation (MT) has attracted considerable attention since the late 1940's. Translation by human intervention is slow and expensive, and the quality of the output is difficult to gauge unless one already knows both languages well. By the early 1950's, it was hoped that MT would be able to provide a fully reliable and quicker alternative; the dream was that a computer could be supplied with a digitized text in a source language (SL) like English and automatically render it into a chosen target language (TL) such as Russian.
During the 1950's and 1960's, much of the effort in this field took place in the United States or Russia, with considerable funding from the two governments in the context of the Cold War. Techniques applied to MT in both areas soon went beyond the simple provision of word-by-word translations and contextual analysis to choose among terms, and included various techniques for the parsing of sentences to gain additional information on content from the sentence structure as well as from the individual words.
By the mid-1960's, there was also considerable debate on the values of establishing a universal "pivot-language" to reduce the number of MT processes that would be needed for global communication. Such an idea had been recommended at a 1952 conference at the Massachusetts Institute of Technology (MIT). The idea was that some one language could be chosen into which all potential source languages might be automatically translated; then from that pivot-language, texts could be automatically generated into any target language, saving much effort in the design of systems.
There were those who suggested using a natural language for this purpose (some early Soviet studies used Russian). Others suggested using an artificial language such as Esperanto. Dr. Alexander Gode in 1954 suggested for this purpose "Interlingua," a project that had been developed under his editorship. Soon other researchers were developing complex pivot-languages of various types that were coded in numbers or logical symbols; e.g., I. A. Melchuk in the Soviet Union during the 1960's. But it was discovered that translations to and from such artificial languages were also fraught with error.
By 1959, Bar-Hillel had already shown that "Fully Automatic High Quality Translation" between two natural languages was intrinsically impossible by machine. But it was a U.S. government report in 1966, the ALPAC Report, that highlighted the limitations of MT techniques and ultimately brought an end to U.S. government funding for MT. Research did continue in the Soviet Union and in Japan (and to a reduced degree elsewhere).
Interest in the MT field was revived in Europe in 1977, with the European Community commissioning work on MT. One such project, begun in 1979, was named DLT (Distributed Language Translation) and used Esperanto as its pivot language. The company was the Bureau for Systems Development (BSO) in Utrecht, Netherlands. Early DLT funding came from the European Community; in 1984, there was a grant of US$ 3.5 million from the Dutch government. By the early 1990's, however, the DLT project was over, not having produced the desired results.
Many more recent MT methodologies rely heavily on sequential word frequency considerations and probability databases. Such methods are more likely to produce readable output, since by nature they recreate word sequences that are not only possible but common in the target language. But this very fact is an extremely serious threat to users, who may be seduced into believing in the accuracy of a text by its very normalcy. By their nature, such methods will produce output that is likely to appear very credible, even though full of mistakes. Furthermore, the user has no way to verify the accuracy of such output--unless the user has access to someone who knows both the source and target language and can confirm accuracy and/or make corrections. The fact remains that traditional MT techniques can only approximate the needed translation; and by their nature they must logically remain prone to introducing dangerous errors into the communication process.
In the field of linguistics, there has been a long series of efforts to create artificial languages that would be superior to natural ones. Descartes and Leibniz were among the earlier designers; and there was early hope of language systems with the precision of mathematics. There was some limited success: botanic, zoological, and chemical nomenclatures were the results of such efforts, as were modern symbolic logic, library catalog systems, and even Roget's Thesaurus. The various search machines on the Internet still struggle to make better order of linguistic information.
During the last two centuries, there have been numerous proposals for an "international auxiliary language" (IAL) which could serve as a universal second language. The Esperanto project, launched in 1887, gained a few thousands of devotees over the generations. Subsequent projects like Ido, Otto Jespersen's Novial, Interlingua, and the "logical languages" Loglan and Lojban also have organizations promoting their use. The basic problem, however, remains: there is no incentive to learn a novel language that has no speakers and no literature and will provide no advantage to the learner unless and until it develops a community of users.
The method of employing a linked alternative language as a potential IAL, differs markedly from all prior IAL projects in that it provides specific uses of economic value, such as access to data, which are in no way tied to a prior-existing community of users. A LAL serving as an IAL and linked to English (as is possible under this invention) would provide immediate and perfectly translated access to all digitized data currently available in the English language. And the methods described here can be used to translate all features of the Internet which are digitized in English into that IAL--and could do so as that data is downloaded by browsers. Such features have never been provided by any IAL project or any MT system.
FIG. 1A illustrates prior art interlinguistic routes to access to data in a source language (SL), either a) by the use of a pivot-language or b) by traditional machine translation (MT) methods. It is impossible for such systems to translate without loss of information in the process. The present invention is designed to produce "lossless" translation, i.e., a form of translation in which absolutely no semantic content is lost, and none gained, in the translation process.
U.S. Pat. No. 4,667,290 entitled "Compilers using a universal intermediate language," filed Sep. 10, 1984 and issued May 19, 1987 discloses the design of a universal intermediate language, but not for use with natural language, but with machine language code. U.S. Pat. No. 4,635,199 entitled "Pivot-type machine translating system comprising a pragmatic table for checking semantic structures, a pivot representation, and a result of translation," filed Apr. 30, 1984 and issued Jan. 6, 1987 describes an invention that "relates to a machine translation system of the so-called pivot type." It describes a specific example of a machine translation system using the pivot-language approach, not the methodologies covered in the present invention.
Input by abbreviation is disclosed in U.S. Pat. No. 4,760,528 entitled"Method for entering text using abbreviated word forms" filed Sep. 18, 1985 and issued Jul. 26, 1988, which discloses one specific system for entering digital information into a computer in the form of abbreviations to be automatically expanded, but mnemonic principles are not involved.
U.S. Pat. No. 4,864,503 entitled, "Method of using a created international language as an intermediate pathway in translation between two national languages" filed Feb. 5, 1987 and issued Sep. 5, 1989, refers to the use of a "created international language" as a pivot language.
U.S. Pat. No. 5,587,903 entitled, "Artificial Intelligence Language Program" issued Dec. 24, 1996, discloses traditional MT methods to convert English sentences into Esperanto and then to allow the user to interface with the program to improve quality.
U.S. Pat. No. 5,696,980 entitled, "Machine Translation System Utilizing Bilingual Equivalence Statements," issued Dec. 9, 1997, discloses an MT system using strategies of computational linguistics to improve the quality of output in the target language; it uses traditional error-prone MT. Similarly, U.S. Pat. No. 5,768,603 entitled "Method and system for natural language translation," filed Jun. 2, 1995 and issued Jun. 16, 1998 also discloses an error-prone pattern, although it seeks to reduce the likelihood of such errors. The techniques of U.S. Pat. No. 5,768,603 applies probabilities or scores to various target language translations.
Communication systems worldwide are moving digitized data at unprecedented and rapidly increasing speeds, especially with the Internet. But most of that data is cast in linguistic form, and the multiplicity of linguistic cultures renders most of it useless to most of the world's population. The preferred embodiment of this invention can supply a system which is able at the same time to supply access to all digitized data now in English (including all web pages and electronic mail now in on the Internet in English), provide a viable IAL, supply far more reliable human-machine interface, and meet a wide variety of other communicative and information management needs in the modern world. A related embodiment, one that allows for delimited multilingual translation using a plurality of natural language databases closely linked within the constraints of template format, on a digital-string to digital-string basis, can facilitate use of the IAL while at the same time providing Internet users with a useful tool by which to communicate across linguistic barriers.