Optical Character Recognition (OCR) generally refers to a machine or computer-implemented method for recognizing a string of characters appearing in an input image and returning a corresponding output string of characters (e.g. in machine-encoded text form). Generally, an OCR process includes the steps of acquiring an input image containing a string of characters to be recognized, recognizing individual characters in the input image as characters of an alphabet, and returning a corresponding output string of characters.
OCR has a wide range of applications including the recognition of vehicle license plate numbers (for use in automated traffic law enforcement, surveillance, access control, tolling, etc.), the recognition of serial numbers on parts in an automated manufacturing environment, the recognition of labels on packages for routing purposes, and various document analysis applications.
Despite sophisticated OCR techniques, OCR errors frequently occur due to the non-ideal conditions of image acquisition, the partial occlusion or degradation of the depicted characters, and especially the structural similarity between certain characters (e.g. Z and 2, O and D, 1 and I). For example, the recognition of vehicle license plate numbers must overcome lighting conditions that are both variable (according to the time of day, weather conditions, etc.) and non-uniform (e.g. due to shadows and specular reflection), perspective distortion, and partial occlusion or degradation of the characters (e.g. due to mud, wear of the paint, etc.)
To improve the overall performance of OCR systems, it is essential to include a post-processing stage, during which OCR errors are automatically detected and corrected.
A popular technique to automatically correct errors in words is “dictionary lookup”: an incorrect word, that is, one that does not belong to a predefined “dictionary” of valid words, is replaced by the closest valid word in the dictionary. This is often achieved by selecting the dictionary word yielding the minimum “edit distance” with the incorrect word. The edit distance between two strings is the minimum number of edit operations (deletions, insertions, and substitutions) required to transform the first string into the second string. The edit distance has been generalized to an edit cost by assigning a weight to an edit operation according to the type of operation and/or the character(s) of the alphabet involved in the operation.
Methods of automatic string correction based on the dictionary lookup paradigm are useful in cases where valid input strings are those belonging to a limited dictionary of valid strings. However, they are inadequate to correct strings that are not of the word-type. There are an increasing number of OCR applications in which valid strings are not words but strings satisfying a “template” of some sort; such strings include vehicle license plate numbers, serial numbers, ID numbers, ZIP codes, etc.
One existing method of string correction based on a “template” involves the determination of a set of edit operations needed to be performed on a string read from an image in order to satisfy a predefined template. A minimum cost of performing edit operations on the string to satisfy the template (i.e. minimum edit distance from string to template) is first determined, after which the edit operations corresponding to this minimum cost are identified and then applied to the string to generate a corrected string that satisfies the template.
In the aforementioned dictionary lookup methods and template-based method, only limited information related to the OCR process is considered during the OCR post-processing stage. In particular, during the OCR process, many potential matches for model characters may have been identified in a particular region of the input image and recognition scores attributed to each potential match. However, only information relating to the selected match (e.g., the potential match having the highest recognition score) is considered during the OCR post-processing stage in determining the minimum edit cost. This limits the ability of such OCR post-processing methods to detect OCR errors and return a correct output string.
There thus exists a need in the industry for an improved method and system for processing candidate strings generated by an optical character recognition process.