The present invention generally relates to recognizing transliterated words and, more particularly, to recognizing transliterated words having inconsistent or alternative spellings.
Electronic documents (including e-mails, texts, etc.) may include transliterated words that originate from a different language than the words in a primary language of the document. For example, a document may be written in English, but may include words in the Russian language (e.g., if the document is written for an English-speaking audience, but references Russian names, cities, sites, etc.).
When writing an electronic document in a primary language (e.g., English), often times, different writers may spell a transliterated word differently, or may mistype the transliterated word. As such, recognizing different and inconsistent spellings of transliterated word can be problematic, particularly in data analysis systems, spellcheck/autocorrect systems, and/or other types of systems in which the recognition of transliterated words is crucial.
Word recognition may often rely on determining a language of a document, which can be problematic when a document includes words from different languages. Moreover, word recognition may require the use of a stemmer, which may or may not exist for a language, and may not properly identify a word.