Two-dimension (2D) matrix symbols are becoming increasingly popular in automated identification applications due to their compact size, large data capacity and their built-in error checking and correction. The encoded information is represented as a binary pattern arranged as a 2D matrix of lines, dots or squares. Characteristic patterns are appended to the 2D matrix to allow the matrix to be located and distinguished easily during decoding. These factors allow the appropriate 2D symbology to be used in specific ID applications such as semiconductor wafer marking and document labels.
For example, the Data Matrix symbology, is a popular choice in wafer ID applications, as described in the International Symbology Specification—Data Matrix, AIM International, Inc. which is incorporated herein by reference. Each symbol can be considered to be made up of three structural elements: (1) A characteristic symbology-specific finder pattern, (2) a timing pattern, and (3) the data region in which the binary pattern representing the encoded data is placed. A matrix having 8 rows and 32 columns has been adopted as a standard by SEMI: Semiconductor Equipment and Materials International, of Mountain View, Calif., in standard T7-0997, which is incorporated herein by reference.
The matrix defined in the SEMI standard has overall dimensions of 4.00 mm wide, 1 mm high, with dots having a 125 μm spacing. The standard instructs users processing round 300 mm diameter wafers to imprint the matrix symbol at a location on the wafer that is approximately 5.0±0.1 degrees from the orientation fiducial axis, and just outside of the outer periphery of the fixed quality area (FQA), at a distance of 148.95±0.15 mm from the center. According to the SEMI specifications, a “cell” is defined as an area in which a dot may be placed to represent binary data. A “dot” is “a localized region with a reflectance which differs from that of the surrounding surface.” A minimum contrast of 30% is required. The location reference point is defined as “the physical centerpoint of the cell common to the primary border row and the center alignment bar.” The center alignment bar of SEMI comprises a line of solid dots abutting a line of alternately filled and empty cells. Some amount of misalignment of dots is contemplated in the standards, and specified at no more than 20 μm, for dots having a circular diameter or square edge of no less than 100 μm+/−10 μm. Similar specifications apply to the symbols, locations, and tolerances for bar codes, such as in the Guidelines for Producing Quality Symbols, containing information about universal product codes (UPC), reduced space symbology (RSS), and stacked bar codes available from Uniform Code Council, Inc., of Lawrenceville, N.J.
In a typical application, a given data string (tag) is encoded by a suitable symbology-specific mathematical transformation into a binary pattern. A data string may include alphanumeric identification symbols that are encoded along with suitable error detection and correction codes (e.g., convolution codes, CRC, Reed-Solomon). This binary pattern is mapped onto the data region of the 2D symbol. The characteristic finder and timing patterns are appended to the symbol and the symbol pattern is marked onto the item being tagged. The marking technique may depend upon the particular application. For example, a laser marking mechanism is used for direct marking on semiconductor wafers, while ink-based printers are used for document labels.
Sophisticated decoding algorithms have to be designed to extract the information encoded in the 2D symbol. Decoders based upon machine vision are increasingly being employed for this purpose because of their relative speed and robustness, i.e., the ability to properly detect the encoded information under sub-optimal conditions. Machine Vision-based scanners typically use the following general approach:                (1) A 2D image of the surface on which the symbol is marked is obtained (for example using a conventional solid-state camera, i.e., CCD), and        (2) The acquired image is then analyzed using a decoding algorithm consisting of two steps:                    (i) locate the rectangular region which contains the pattern, and            (ii) decipher the binary pattern and extract the encoded data string.                        
The robustness of the decoding algorithm is in large part affected by three main factors which influence the appearance of the symbol: (1) the marking techniques used, (2) the surface upon which the symbol is printed, and (3) the illumination and optics used to acquire the 2D image. These factors contribute to significant deviations of the appearance of the symbol in real-world applications, as compared to the ideal binary pattern (e.g. as laid forth in the AIM specification). The typical image of the 2D symbol suffers from problematic artifacts, which are manifested as geometric distortions of individual data elements, because of non-uniform background, or poor image quality, among other reasons.
Existing techniques for 2D matrix location are based upon “connected component analysis” and are vulnerable to errors when applied to the task of locating 2D symbols that have been distorted as described above. The present state of the art does not permit locating 2D symbols precisely while having the ability to handle a wide variety of distortion. The connected component-based approach, combined with intensity area correlation to locate the matrix, is very susceptible to marking variations that cause the symbol finder pattern elements to appear distorted. For example, distortion causes contiguous data modules along the finder pattern area and within the symbol to appear as being separated by “breaks” between adjacent modules (e.g. for standardized Data Matrix symbols), caused by scratches and smears along the symbol.
Similarly, known machine vision-based decoders cannot handle a large variety of symbols irrespective of variations in the marking and symbol quality. This requires the use of different processes, depending upon the symbology to be recognized. Automatic selection among several computational methods may be possible in real-time, however, they would be necessarily slow.