Extracting data (e.g., text, numbers, symbols, etc.) from images (e.g., filled forms, drawings, digital documents, etc.) and building meaningful information from the extracted data is a complex and time consuming task as a number of different text, numbers, and symbols are essentially required to be identified and correlated. Typically, such data extraction and information building is done manually and is prone to human errors. More recently, computer based system have been employed to automatically extract data and build meaningful information from digital images. Many of such systems employ optical character recognition (OCR) techniques for extraction of data from the digital images.
OCR is a computer based translation of an image of text into digital form as machine-editable text (i.e., text data), generally in a standard encoding scheme. The OCR process, therefore, eliminates the need to manually type the document into the computer system. There are many existing OCR techniques to identify, recognize, and position the characters from a textual image so as to generate a text data. Most of the existing techniques, employed for positioning the recognized character in their respective position in order to generate the text data, uses pixel coordinate of each of the character. However, the usage of pixel coordinates to position the character in the text data, subsequent to character recognition, is cumbersome as it involves exhaustive identification and calculation of pixel positions. Other existing techniques, for positioning the characters inside the text data, also require extensive image analysis demanding huge amount of processing time and capability.