Field of the Invention
The present invention relates to the field of image processing, specifically to image processing using optical character recognition (OCR) technology.
Description of the Related Art
Optical Character Recognition (OCR) is the electronic conversion of scanned or photographed images or typewritten or printed text into machine-encoded computer-readable text. Modern optical character recognition technologies actively use training as part of the recognition process. During OCR a recognition pattern is created and then the training process is utilized to perfect the result. Recognition with training is often used when the text being processed contains decorative or special fonts or special symbols such as mathematical symbols and characters from a rare alphabet. The training process includes creation of user patterns by the system. As part of the pattern creating process, the images of characters (graphemes) that need to be paired with recognized characters are identified and presented to a user, and the user is asked to assign characters to the presented images. The images of characters (graphemes) are usually derived from a part of the document used for training patterns. The training results in a set of patterns for the images of characters (graphemes) that were encountered in the training document. The set of patterns thus created is subsequently used during recognition of the main part of the document.
Document verification is also often used as part of character recognition process. Verification improves quality of the character recognition by allowing the system to correct recognition inaccuracies.