The present method relates generally to character reading and more specifically to a robust technique for recognizing character strings in grayscale images where such strings may be of poor contrast or where some characters in the text string or the entire text string may be distorted or partially obscured.
Various approaches have been applied to improve the classification accuracy for optical character recognition (OCR) methods. The present method relates generally to optical character recognition and more specifically to a technique for recognizing character strings in grayscale images where such strings may be of poor contrast, variable in position or rotation with respect to other characters in the string or where characters in the string may be partially obscured.
Different challenges are posed in many industrial machine vision character reading applications, such as semiconductor wafer serial number identification, semiconductor chip package print character verification, vehicle tire identification, license plate reading, etc. In these applications, the font, size, and character set are well defined yet the images may be low contrast, individual or groups of characters imprinted in the application may be skewed in rotation or misaligned in position or both, characters may be partially obscured, and the image may be acquired from objects under varying lighting conditions, image system distortions, etc. The challenge in these cases is to achieve highly accurate, repeatable, and robust character reading results.
Character recognition in digital computer images is an important machine vision application. Prior art optical character recognition methods work well (i.e. achieve high classification accuracy) when image contrast is sufficient to separate, or segment, the text from the background. In applications such as document scanning, the illumination and optical systems are designed to maximize signal contrast so that foreground (text) and background separation is easy. Furthermore, conventional approaches require that the characters be presented in their entirety and not be obscured or corrupted to any significant degree. While this is possible with binary images acquired from a scanner or grayscale images acquired from a well controlled low noise image capture environment, it is not possible in a number of machine vision applications such as parts inspection, semiconductor processing, or circuit board inspection. These industrial applications are particularly difficult to deal with because of poor contrast or character obscuration. Applications such as these suffer from a significant degradation in classification accuracy because of the poor characteristics of the input image. The method described herein utilizes two approaches to improve classification accuracy: (1) using region-based hit or miss character correlation and (2) field context information.
In the preferred embodiment, the invention described herein is particularly well suited for optical character recognition on text strings with poor contrast and partial character obscuration as is typically the case in the manufacture of silicon wafers. Many semiconductor manufacturers now include a vendor code on each wafer for identification purposes and to monitor each wafer as it moves from process to process. The processing of silicon wafers involves many steps such as photolithographic exposure etching, baking, and various chemical and physical processes. Each of these processes has the potential for corrupting the vendor code. Usually the corruption results in poor contrast between the characters or the background for some portion of the vendor code. In more severe cases, some of the characters may be photo-lithographically overwritten (exposed) with the pattern of an electronic circuit. This type of obscuration is difficult if not impossible to accommodate with prior art methods. Another possibility is that the vendor code will be written a character at a time (or in character groups) as processes accumulate. This can result in characters within the text string that are skewed or rotated with respect to the alignment of the overall text string.
Computerized document processing includes scanning of the document and the conversion of the actual image of a document into an electronic image of the document. The scanning process generates an electronic pixel representation of the image with a density of several hundred pixels per inch. Each pixel is at least represented by a unit of information indicating whether the particular pixel is associated with a xe2x80x98whitexe2x80x99 or a xe2x80x98blackxe2x80x99 area in the document. Pixel information may include colors other than xe2x80x98blackxe2x80x99 and xe2x80x98whitexe2x80x99, and it may include gray scale information. The pixel image of a document may be stored and processed directly or it may be converted into a compressed image that requires less space for storing the image on a storage medium such as a storage disk in a computer. Images of documents are often processed through OCR (Optical Character Recognition) so that the contents can be converted back to ASCII (American Standard Code for Information Interchange) coded text.
In image processing and character recognition, proper orientation of the image on the document to be processed is advantageous. One of the parameters to which image processing operations are sensitive is the skew of the image in the image field. The present invention provides for pre-processing of individual characters to eliminate skew and rotation characteristics detrimental to many image processing operations either for speed or accuracy. The present invention also accommodates characters that may be partially corrupted or obscured.
Prior art attempts to improve character classification accuracy by performing a contextual comparison between the raw OCR string output from the recognition engine and a lexicon of permissible words or character strings containing at least a portion of the characters contained in the unknown input string (U.S. Pat. No. 5,850,480 by Scanlon et. al. entitled xe2x80x9cOCR error correction methods and apparatus utilizing contextual comparisonxe2x80x9d Second Preferred Method Embodiment paragraphs 2-4). Typically, replacement words or character strings are assigned confidence values indicating the likelihood that the string represents the intended sequence of characters. Because Scanlon""s method requires a large lexicon of acceptable string sequences, it is computationally expensive to implement since comparisons must be made between the unknown sequence and all of the string sequences in the lexicon. Scanlon""s method is limited to applications where context information is readily available. Typical examples of this type of application include processing forms that have data fields with finite contents such as in computerized forms where city or state fields have been provided.
Other prior art approaches (U.S. Pat. No. 6,154,579 by Goldberg et. al. entitled xe2x80x9cConfusion Matrix Based Method and System for Correcting Misrecognized Words Appearing in Documents Generated by an Optical Character Recognition Techniquexe2x80x9d, Nov. 28, 2000, Detailed Description of the Invention, paragraphs 4-7 inclusive) improve overall classification accuracy by employing a confusion matrix based on sentence structure, grammatical rules or spell checking algorithms subsequent to the primary OCR recognition phase. Each reference word is assigned a replacement word probability. This method, although effective for language based OCR, does not apply to strings that have no grammatical or structural context such as part numbers, random string sequences, encoded phrases or passwords, etc. In addition, Goldbergs approach does not reprocess the image to provide new input to the OCR algorithm.
Other prior art methods improve classification performance by utilizing a plurality of OCR sensing devices as input (U.S. Pat. No. 5,807,747 by Bradford et. al. entitled xe2x80x9cApparatus and method for OCR character and confidence determination using multiple OCR devicesxe2x80x9d, Sep. 8, 1998, Detailed Description of the Preferred Embodiments, paragraphs 4-7 inclusive). With this approach a bitmapped representation of the text from each device is presented to the OCR software for independent evaluation. The OCR software produces a character and an associated confidence level for each input device and the results of each are presented to a voting unit that tabulates the overall results. This technique requires additional costly hardware and highly redundant processing of the input string, yet it does not resolve misalignment or rotation or obscuration input degradations, and it is not useful for improving impairment caused by character motion or applications where character images are received sequentially in time from a single source and does not use learning of correlation weights to minimize source image noise.
It is an object of this invention to use region-based normalized cross-correlation to increase character classification accuracy by reducing the contribution to the overall score on portions of a character that may be obscured.
It is an object of this invention to use morphological processing to determine the polarity of the text relative to the background.
It is an object of this invention to use structure guided morphological processing and grayscale dispersion to identify the location of a text string in a grayscale image.
It is an object of this invention to adjust the skew prior to correlation with the feature template to minimize the number of correlation operations required for each character.
It is an object of this invention to adjust the individual character rotation prior to correlation with the feature template to minimize the number of correlation operations required for each character and to enhance accuracy.
It is an object of this invention to treat the character input region of interest (ROI) as a mixture of two separate populations (background, foreground) of grayscale values and to adaptively determine the optimal threshold value required to separate these populations.
It is an object of this invention to improve character classification accuracy by applying field context rules that govern the types of alphanumeric characters that are permissible in the field being processed and hence the specific correlations that will be performed.
It is an object of this invention to decrease the weight on portions of the character that exhibit high variation and ultimately contribute to a less reliable classification such that they contribute less to the overall hit correlation score Hn(P). Portions of the character that exhibit less variation during the learning process are consequently weighted higher making their contribution to the hit (or miss) correlation score more significant.
The method described herein improves classification accuracy by improving the effectiveness or robustness of the underlying normalized correlation operation. In one embodiment this is achieved by partitioning each unknown input character into several pre-defined overlapping regions. Each region is evaluated independently against a library of template regions. A normalized correlation operation is then performed between the unknown input character region and each of the character template regions defined in the character library. Doing so provides two substantial benefits over prior art methods. First, portions of the character that may be obscured or noisy in a systematic way are removed from the correlation operation thus minimizing their detrimental impact on the overall classification of the character. Second, the remaining portions of the character, those without obscuration, are weighted more heavily than they otherwise would be, thus improving the degree of correlation with the actual character and increasing the margin between the actual character and the next most likely character. In the simplest implementation, the portion of the character that yields the lowest correlation score can be defined as the most likely portion of the character containing an obscuration or imaging degradation and its effects minimized by the approach described.
In image processing and character recognition, proper orientation of the image on the document to be processed is advantageous. One of the parameters to which template based image processing operations are sensitive is the skew of the image in the image field. The present invention provides for pre-processing of images to eliminate skew and rotation. The processes of the present invention provides for consistent character registration and converts inverse type to normal type to simplify processing.