Devices and methods for automatically translating documents from one language to another are known. However, these devices and methods often fail to accurately translate documents from one language to another, can consume large amounts of time and can be inconvenient to use. In addition to human-based translators, other known devices include commercially available machine translation software. These known systems have flaws that render them susceptible to errors, slow speed and inconvenience. Known translation devices and methods cannot consistently return accurate translations for text input and therefore frequently require intensive user intervention for proof reading and editing. Accurate machine translation is more complicated than providing devices and methods that make word-for-word translations of documents. In these word-for-word systems, the translation often times makes little sense to readers of the translated document, as the word-for-word method results in wrong word choices and incoherent grammatical units.
To overcome these deficiencies, known translation devices have for decades attempted to make choices of word translations within the context of a sentence based on a combination or set of lexical, morphological, syntactic and semantic rules. These systems, known in the art as “Rule-Based” machine translation (MT) systems are flawed because there are so many exceptions to the rules that they cannot provide consistently accurate translation.
In addition to Rule-Based MT, in the last decade a new method for MT known as “example-based” (EBMT) has been developed. EBMT makes use of sentences (or possibly portions of sentences) stored in two different languages in a cross-language database. When a translation query matches a sentence in the database, the translation of the sentence in the target language is produced by the database providing an accurate translation in the second language. If a portion of a translation query matches a portion of a sentence in the database, these devices attempt to accurately determine which portion of the sentence mapped to the source language sentence is the translation of the query.
EBMT systems cannot provide accurate translation of a broad language because the databases of potentially infinite cross-language sentences are built manually and will always be predominantly “incomplete.” Another flaw of EBMT systems is that partial matches are not reliably translated. Systems that use statistical machine translation attempt to automate the creation of cross-language databases using pairs of translated documents in combination with a large corpus of documents in just the target language. None of these systems use an algorithm that reliably and accurately distill the translations of a sufficient number of words and word-strings from a pair of translated documents to produce a reliable translation.
Some translation devices combine both Rule-Based, Statistical MT and/or EBMT engines. Although this combination of approaches may yield a higher rate of accuracy than either system alone, the results remain inadequate for use without significant user intervention and editing.
The problems faced when attempting to translate documents from one language to another can apply more generally to the problem of converting data representing ideas or information from one state, say words, into data representing the ideas in another state, for example, mathematical symbols. In such cases cross-idea association databases that associate data in one state with equivalent data in the second state must be consulted. Therefore, a need exists for an improved and more efficient method and apparatus for creating dictionaries or databases that associate equivalent ideas in different languages or states, (e.g., words, word-strings, sounds, movement and the like) and for translating or converting ideas conveyed by documents in one language or state into the same or similar ideas represented by documents in a second language or state.
The invention relates to manipulating content using a cross-idea association database. In particular, the present invention provides a method and apparatus for creating a database of associated ideas and provides a method and apparatus for utilizing that database to convert ideas from one state into other states.
In one embodiment, and by example, the present invention provides a method and apparatus for creating a language translation database, where two languages form the database of associated ideas. The present invention also provides a method and apparatus for utilizing that language database to convert documents (representing ideas) from one language to another (or more generally, from one state to another). However, the present invention is not limited to language translation, although that preferred embodiment will be presented. The database creation aspect of the present invention may be applied to any ideas that are related in some manner but expressed in different states and the conversion aspect of the present invention may be applied to accurately translate ideas from one state to another.
In another embodiment, the database creation aspect of the present invention can be used to make associations between ideas within a single language and their relationship to one another, to be used in artificial intelligence applications.
The application of the present invention to a language translation embodiment will now be described. As used herein, the terms related to converting, translating, and manipulating are used interchangeably and in their broadest sense.