The web is becoming less dominated by English-language content and English-speaking users. One challenging but very desirable task accompanying web growth is to analyze and organize web content written in different languages, to make it easily accessible for all users. Some work has been conducted to help achieve this goal, including research involving statistical machine translation. However, previous studies usually depend on bilingual/multilingual dictionaries or parallel corpora in which texts in one language and their translation in other language(s) are well aligned at word or sentence levels. Such dictionaries and corpora are usually constructed by human editors, are domain specific, and are expensive to scale up, which will restrict such research work from being adapted to many languages and domains.