Despite the existence of electronic media in today's world, a considerable amount of written communications is in paper form such as books, bank checks, contracts, etc. There is an increasing demand for the automation of information extraction, classification, search, and retrieval of documents.
Recognition of printed characters using computers has been one of the first and most successful applications of pattern recognition. Optical Character Recognition (OCR) has been an active field of research for more than three decades. There are hundreds of hundreds of approaches proposed to address the recognition of machine-printed and handwritten characters for different scripts. For machine-printed Latin scripts, the problem can be considered as already solved at least when the level of noise is low. On applications where clear imaging is available typical recognition rates for machine-printed characters exceed 99%. However, the difficulty is in dealing with handwritten characters and words, particularly when the images are noisy. The difficulty of the recognition of handwriting lies in the fact that there can be as many handwriting styles as there are people. In fact, it is widely believed that each individual's handwriting is unique to themselves. In the discipline of forensic science, handwriting identification, which is the study of the identification or verification of the writer of a given handwritten document, is based on the principle that the handwritings of no two people are exactly alike. This means that the number of forms that a handwritten character/word can take is too many, making the recognition a difficult task even for humans.
Accordingly, there is a need for a complete methodology for the spotting of arbitrary keywords in handwritten document images that can handle the challenges that exist in real-world situations.