Commonly, an image is formed to represent a scene of interest that can include one or more physical objects on a background. In computer vision, an image typically consists of a two-dimensional array of picture elements, called pixels. Pixels can be square, rectangular, or hexagonal, for example. Each pixel is associated with a gray value, and each gray value can range from 0 to 255, for example. The gray value of each pixel is determined by the visual properties of the portion of the scene represented by that pixel.
An important task involved in analyzing images is differentiating between an object and background of an image, or between the foreground generally and the background of an image. The visual properties of an object are usually different from the visual properties of the background. Thus, the gray values of the portion of the image that represents the object will on average be different from the gray values of the portion of the image that represents the background.
It is known to differentiate an object from the background by comparing each pixel in an image with a single threshold gray value. If the gray value of the pixel is above the threshold gray value, then it can be categorized as object or foreground; if the gray value of the pixel is equal to or below the threshold gray value, then it can be categorized as background. Note that the threshold is static, non-adaptive, and is applied globally, in that each and every pixel in the image is compared with the same threshold.
One drawback to this approach is that, due to non-uniform illumination across an image which can make one side of the scene darker than another, object portions of the image can easily be misclassified as background portions, and vice versa. Another disadvantage is that each pixel of the image must be processed. Since computation time is proportional to the number of pixels processed, this approach can present a computational burden when there are large images, or when many images must be processed rapidly, for example.
Another way to differentiate an object from the background is to start with a single pixel on the object/background boundary, and to compare each pixel in the neighborhood of the pixel to a single threshold gray value. If the neighborhood pixel is above the threshold, it is an object pixel; if it is below the threshold, it is a background pixel. The transition from object pixel to background pixel is the boundary, and it can be further localized with an interpolation step. The neighborhood pixel that is closest to the boundary is then selected, and each of its neighboring pixels is compared to the single threshold gray value. The process is repeated, resulting in a sequence of pixels that tracks the boundary of an object. An example of this technique is the Cognex Boundary Tracker, sold by Cognex Corporation, Natick Mass.
Although this method processes less pixels than the previous method that processes every pixel in an image by only processing pixels in a narrow band around each object/background boundary, it also suffers from the problem of classification error due to non-uniform illumination across an image.
A third way to locate the boundary between object and background is to first perform "edge detection" on the image. An edge is usefully defined as a change in gray value from dark to light or from light to dark that can span many pixels. Since the gray values of the image that represent the object will on average be different from the gray values that represent the background, there is an extended band or chain of pixels that have gray values that transition between the gray values of the object and the background. This extended band or chain of pixels is called an "edge contour". Thus, an edge contour indicates the boundary between an object and the background.
In "edge detection", the entire image is processed so as to label each pixel of the image as being either on an edge or not on an edge. Since every pixel in the image is processed, this first step is very computationally expensive. Next, an "edge linking" step is performed wherein pixels on edges are combined into "edge contours" or "boundary chains" which defines the boundary between object and background. This step is also computationally expensive, because every pixel that has been labeled as being on an edge must be processed to determine whether it can be included in a boundary chain. Moreover, this method does not provide sub-pixel accuracy.