Bilingual dictionaries are valuable for many applications such as machine translation, cross language information retrieval, and information exchange in electronic commerce. However, current techniques for making bilingual dictionaries require manual input, review, and editing of dictionary entries, which is expensive and time consuming. In addition, dictionaries constructed in this manner cannot be updated in a timely manner as new words appear.
One prior approach to dictionary construction uses an automated translation model that is learned from parallel web documents available via the Internet, i.e., web documents for which exact translations exist in a first language and a second language. The model exploits the common organization of the parallel translations to extract translation pairs from the translated documents, which are used to form a dictionary.
One drawback to such an approach is that it relies upon web documents for which multiple translations are available. Since such documents comprise only a small percentage of the total number of documents available on the Internet, it is difficult to build large, comprehensive dictionaries from this small number of translated documents.