Text recognition is known. The current state of the art generally is based on computational models and tools that work on the entire scanned text image stored in the system. That is, typically every scanned pixel of the text is used for text recognition. This typically requires a large amount of space for storage. Further, these methods are time consuming as computation and processing for character recognition and classification takes place using a large amount of data.
Automatic recognition of Asian characters may be a challenge. There have been some efforts to explore methods using top and bottom views of scanned text. However, these top and bottom methods have not been successful for many Asian languages that use ‘Matra’. A Matra is a running horizontal line along the top of the characters. This horizontal line reduces the usefulness of the top view for character recognition of the text of many Asian languages. This is because the top view of many characters in these languages is exactly the same.
The Matra is prevalent and typical in many Asian languages, including Bengali, and Hindi, etc. Other languages for which character recognition is difficult include traditional Mongolian script and its offshoots like Manchu that are written vertically.
Other difficulties effecting the ability to correctly recognize even typed language symbols is the complex irregularity among the characters for different languages, fonts, styles and size. This irregularity widens when one deals with handwritten characters or widely varying fonts of a given language.