Many multi-lingual applications, such as machine translation or cross-language information retrieval software, require bilingual lexicon to produced desired translation results. However, manually compiled bilingual dictionaries are often inadequate to serve this purpose due to their limited coverage. For example, machine translation or cross-language information retrieval software may be unable to correctly translate a first term written in a first language to a second term of the same meaning in a second language due to the fact that the first term is not in the presently used bilingual dictionary. Such terms may be referred to as Out-Of-Vocabulary (OOV) terms. These OOV terms may severely deteriorate the quality of a machine translated document, or drastically hinder the ability of cross-language information retrieval software to retrieve relevant data.
With a sharp increase of bilingual pages (web pages with content in two or more languages), web mining of term or sentence translations, that is, a term or sentence in a first language proximately located to a translation of the term or sentence in a second language, can greatly alleviate this problem. In some instances, some web mining methods may manually define a set of pattern rules to extract term or sentence translations from web pages, as layout patterns of term translations on a single web page tend to occur in similar patterns. For example, a parenthetical pattern, where a first term in a first language is followed by a second term in a second language in parenthesis, may be used to extract term translations from bilingual web pages that implement such a pattern rule.