Various approaches have been applied to improve the classification accuracy for optical character recognition (OCR) methods. The present method relates generally to optical character recognition and more specifically to a technique for recognizing character strings in grayscale images where such strings may be of poor contrast, variable in position or rotation with respect to other characters in the string or where characters in the string may be partially obscured.
Different challenges are posed in many industrial machine vision character reading applications, such as semiconductor wafer serial number identification, semiconductor chip package print character verification, vehicle tire identification, license plate reading, etc. In these applications, the font, size, and character set are well defined yet the images may be low contrast, individual or groups of characters imprinted in the application may be skewed in rotation or misaligned in position or both, characters may be partially obscured, and the image may be acquired from objects under varying lighting conditions, image system distortions, etc. The challenge in these cases is to achieve highly accurate, repeatable, and robust character reading results.
Character recognition in digital computer images is an important machine vision application. Prior art optical character recognition methods work well (i.e. achieve high classification accuracy) when image contrast is sufficient to separate, or segment, the text from the background. In applications such as document scanning, the illumination and optical systems are designed to maximize signal contrast so that foreground (text) and background separation is easy. Furthermore, conventional approaches require that the characters be presented in their entirety and not be obscured or corrupted to any significant degree. While this is possible with binary images acquired from a scanner or grayscale images acquired from a well controlled low noise image capture environment, it is not possible in a number of machine vision applications such as parts inspection, semiconductor processing, or circuit board inspection. These industrial applications are particularly difficult to deal with because of poor contrast or character obscuration. Applications such as these suffer from a significant degradation in classification accuracy because of the poor characteristics of the input image. The method described herein utilizes two approaches to improve classification accuracy: (1) using region-based hit or miss character correlation and (2) field context information.
In the preferred embodiment, the invention described herein is particularly well suited for optical character recognition on text strings with poor contrast and partial character obscuration as is typically the case in the manufacture of silicon wafers. Many semiconductor manufacturers now include a vendor code on each wafer for identification purposes and to monitor each wafer as it moves from process to process. The processing of silicon wafers involves many steps such as photolithographic exposure etching, baking, and various chemical and physical processes. Each of these processes has the potential for corrupting the vendor code. Usually the corruption results in poor contrast between the characters or the background for some portion of the vendor code. In more severe cases, some of the characters may be photo-lithographically overwritten (exposed) with the pattern of an electronic circuit. This type of obscuration is difficult if not impossible to accommodate with prior art methods. Another possibility is that the vendor code will be written a character at a time (or in character groups) as processes accumulate. This can result in characters within the text string that are skewed or rotated with respect to the alignment of the overall text string.