Introduction
In electronic digital image processing hardware, images are stored in discrete memory devices. The image is often segmented into an array of values, where each memory location corresponds to a particular spatial coordinate point and the value at that memory location, called a picture element (pixel), corresponds to the brightness of the image at that coordinate point. FIG. 1A shows an example of an object in an image where the brightness is given by numbers at each pixel coordinate point.
Image and Template Correlation
One common technique to automatically locate objects in an image of a machine vision system is to use a correlation or convolution. There are several types of image correlation methods including convolution, normalized correlation, the least mean squares error, and the least mean absolute error. A definition of a correlation method requires the determination and use of a template or kernel which is a separate small image with the same shape as the object to be located. FIG. 1B shows a template shaped like the object shown in the image of FIG. 1A. Like the object, the template may be represented by spatial coordinate points with a brightness value for each point. The template is selectively displaced and moved from location to location around a region of interest in the image. At each new template location in the image, the sum of products is computed for the value of each template pixel with the corresponding value of each image pixel at a common spatial coordinate point. FIG. 1C shows one location of the template in FIG. 1B displaced on the image. In this case there is no overlap at that displacement, and the sum of products is zero. The computational output of the correlation or convolution is at a maximum at the location where the shape of the template pattern most closely matches the shape of a pattern in the image. FIG. 1D shows the correlation for all possible displacements of the template across the image. The numeric values are rather large, so FIG. 1D shows only an approximate and relative indication of the correlation by the intensity of shading.
The formula for a discrete two dimensional convolution is given by ##EQU1## where I is an image, K is a kernel, and x and y are image coordinates defining a spatial coordinate point. The summation over u and v range over the template. In practice, the template is smaller than the image containing the object whose location is being determined.
Normalized correlation is a well known method similar to correlation, except that the value of each element of the template is multiplied by a constant scale factor, and a constant offset is added. At each template displacement the scale factor and offset are independently adjusted to give a minimum error in the correlation of the template at each image location. The normalized correlation method in template matching is covered in detail in an article entitled "Alignment and Gauging Using Normalized Correlation Search" by William Silver, in VISION '87 Conference Proceedings, pp. 5-33-5-55, which is incorporated herein by reference.
In the least mean squared error method each template point is subtracted from the corresponding image point; each difference is squared; and the average of all differences are computed. The formula for the least squared error is ##EQU2## where N is the number of pixels in the kernel. The computational output of the least mean squared error is at a minimum where the template pattern matches a pattern in the image. In the least mean absolute error method each template point is subtracted from the corresponding image point; the absolute value of each difference is computed; and the average of all differences are computed. The formula for the least absolute error is ##EQU3## The computation output of the least mean absolute error is also at a minimum where the patterns match.
The techniques described above are substantially the same in the sense that a template, itself is a gray level image, is displaced from location to location about a corresponding gray level image containing an object whose coordinate location is within the image is of interest. At each location a function is applied to neighboring image pixel values and the corresponding template values at common coordinate points. The result is another image where each pixel at a coordinate point is a single number that represents how well the template fits the object in the image at that point.
Binary Vector Correlation
Vector correlation or convolution provides an alternative approach to the correlation methods discussed above. In vector correlation the image and selected template are composed of pixels which are vectors. The theory behind binary vector correlation is covered in a paper entitled "Vector Morphology and Iconic Neural Networks" by S. S. Wilson in IEEE Transactions on Systems, Man, and Cybernetics, November/December, 1989, vol. 19, no. 6, pp. 1636-1644, which is incorporated by reference. A similar technique was further discussed in the paper entitled "Teaching network connections for real-time object recognition", by S. S. Wilson in Neural and Intelligent Systems Integration, pp. 135-160, Wiley-Interscience, 1991. Briefly, the most common form of binary vector correlation consists of transforming a gray level image to several binary images, where the composite of binary images represents a vector in the sense that each pixel in the vector image has several components--each from one of the binary images. Next, a vector template is defined for the purpose of recognizing a pattern. The vector template also consists of the same number of components as the vector image.
The position of the vector template is displaced and moved from location to location around the region of interest in the image. At each location, the sum of inner products (or dot product) is computed for a vector pixel in the template and a vector pixel in the image for a corresponding coordinate point. In mathematical terms, the formula for a discrete two dimensional vector convolution is given by ##EQU4## where I is a vector image and K is a vector kernel, and x and y are image coordinates. The summation over u and v range over the template.
A detailed description of one technique of vector correlation follows. Starting with an input image, the first step is to form another image called the horizontal finite difference by subtracting from the value of a pixel of the input image, the value of a neighboring pixel displaced a small distance to the right. The resulting image will contain large positive or negative values around those coordinate points where there is a significant vertical edge. A positive value in the horizontal finite difference image is called an east edge and represents an edge that decreases in intensity from left to right. A negative value in the horizontal finite difference image is called a west edge and represents an edge that increases in intensity from left to right.
The second step is to form another image called the vertical finite difference by subtracting from the value of a pixel of the input image, the value of a neighboring pixel displaced a small distance upward. The resulting image will contain large positive or negative values around those coordinate points where there is a significant horizontal edge. A positive value in the vertical finite difference image is called an north edge and represents an edge that decreases in intensity in the upward direction. A negative value in the vertical finite difference image is called a south edge and represents an edge that increases in intensity in the upward direction.
The third step in binary vector correlation is to form a binary vector image where each pixel contains a vector comprised of four binary numbers labeled N, S, E, and W which correspond to the four compass directions. The N binary number is computed by comparing the vertical finite difference to a small positive number called a threshold, and associating a binary 1 for those values that exceed the threshold, and a binary zero otherwise. The S binary component is computed by comparing the vertical finite difference to the negative of the threshold, and associating a binary 1 for those values that are smaller than the threshold, and a binary zero otherwise. The E and W binary components are computed in a similar manner using the horizontal finite difference image.
The fourth and final step is to displace the position of the vector template from location to location within a region of interest in the original image. At each new location in the image, the sum of inner products is computed for the value of each vector pixel in the template with each corresponding vector pixel in the original image for a corresponding coordinate point.
FIG. 2A is an example of a vector image after edge detection using both horizontal finite difference and vertical finite difference. Although each coordinate point in the image should illustrate a vector it is easier to illustrate only the label of the non-zero component. A coordinate point having a blank pixel represents a vector where all elements are zero. Although in practice it is possible for two components to be non-zero such as N and W, none are shown by way of example in FIG. 2A. FIG. 2B is a corresponding vector template formed using both the horizontal finite difference image and the vertical finite difference image and a predetermined threshold. It is apparent that when the template in FIG. 2B is moved from location to location about the image in FIG. 2A, the correlation response as roughly indicated in FIG. 2C will result.
Benefits of Vector Correlation
Vector correlation is very accurate because the correlation is with the edges of a pattern. Referring to FIG. 2C a slight offset of the template from the actual location of the pattern in the image means that most of the edges of the template complete miss contact with the pattern and results in no correlation. However, as shown in FIG. 1D, there is a large area of partial contact of the image with the template if an ordinary correlation is used.
Vector correlation is very capable of locating degraded patterns in an image as long as the edges of the pattern are not severely degraded. A degraded image consists of a large amount of clutter surrounding the pattern of interest, or missing pieces within the pattern. An ordinary correlation may fail to distinguish the pattern from the surrounding clutter.
In an ordinary correlation a template image is used. Thus, template matching will fail to give accurate results if the visual nature of the object is substantially different from the object used as the template. Vector correlation is superior to ordinary correlation because only the important edge features of an object are used in identifying the object.
Problems with Vector Correlation
Vector correlation is computationally intensive because of the extra steps in forming and operating with vector images. First, there is an increased requirement for memory storage in order to store the finite differences and computed edges. Second, the templates used in characterizing patterns to be found are vectors and are more complex to define. Third, diagonal edges are not represented accurately. A diagonal in the north and east direction must be represented by a template that has both the north and east edges represented at the same point. A diagonal point in a template is more difficult to handle. Finally, the method is restricted to a high performance parallel processor.
The use of a small finite difference coordinate displacement encompassing one or two units produces very narrow edges in the finite difference images. As the correlation template is moved over the vector image, narrow edges allow a strong localization of the correlation, and leads to high accuracy of the pattern matching of the template to the image. However, since the pattern location is unknown, the template must be moved over a large number of positions, in general, before the pattern can be found. This is referred to as a fine grained search. The unknown pattern in this case might be lost if it is slightly rotated so that the template does not fit anywhere.
The finite difference can be made larger for simple images. For example, a finite difference coordinate displacement encompassing ten units would lead to edges that are ten pixels wide and allow rotated patterns to be found. Additionally, the pattern could be found by sampling the template over the image, not exhaustively, but in a coarse grid in steps encompassing up to ten units in both the horizontal and vertical directions. In this example, the search would be ten squared, or one hundred times faster. However the accuracy, once the pattern is found, would be very poor.
A compromise would be to provide a coarse grained search to get a rough location of the pattern, and then a fine grained search to zero in on the pattern. The edge detection would have to be performed twice, once for the large finite differences, and once for the small finite differences. A large amount of memory storage and a high performance parallel processor are still required.