Image processing systems exist in the prior art for recognizing objects. Often these systems use histograms to perform this recognition. One common histogram method either develops a gray scale histogram or a color histogram from a (color) image containing an object. These histograms are then compared directly to histograms of reference images. Alternatively, features of the histograms are extracted and compared to features extracted from histograms of images containing reference objects.
The reference histograms or features of these histograms are typically stored in computer memory. The prior art often performs these methods to verify that the target object in image is indeed the object that is expected, and, possibly, to grade/classify the object according to the quality of its appearance relative to the reference histogram. An alternative purpose could be to identify the target object by comparing the target image object histogram to the histograms of a number of reference images of objects.
In this description, identifying is defined as determining, given a set of reference objects or classes, which reference object the target object is or which reference class the target object belongs to. Classifying or grading is defined as determining that the target object is known to be a certain object and/or that the quality of the object is some quantitatively value. Here, one of the classes can be a "reject" class, meaning that either the quality of the object is too poor, or the object is not a member of the known class. Verifying, on the other hand, is defined as determining that the target is known to be a certain object or class and simply verifying this is to be true or false. Recognizing is defined as identifying, classifying, grading, and/or verifying.
Bulk items include any item that is sold in bulk in supermarkets, grocery stores, retail stores or hardware stores. Examples include produce (fruits and vegetables), sugar, coffee beans, candy, nails, nuts, bolts, general hardware, parts, and package goods.
In image processing, a digital image is an analog image from a camera that is converted to a discrete representation by dividing the picture into a fixed number of locations called picture elements and quantizing the value of the image at those picture elements into a fixed number of values. The resulting digital image can be processed by a computer algorithm to develop other images. These images can be stored in memory and/or used to determine information about the imaged object. A pixel is a picture element of a digital image.
Image processing and computer vision is the processing by a computer of a digital image to modify the image or to obtain from the image properties of the imaged objects such as object identity, location, etc.
An scene contains one or more objects that are of interest and the surroundings which also get imaged along with the objects. These surroundings are called the background. The background is usually further away from the camera than the object(s) of interest.
Segmenting (also called figure/ground separation) is separating a scene image into separate object and background images. Segmenting refers to identifying those image pixels that are contained in the image of the object versus those that belong to the image of the background. The segmented object image is then the collection of pixels that comprises the object in the original image of the complete scene. The area of a segmented object image is the number of pixels in the object image.
Illumination is the light that illuminates the scene and objects in it. Illumination of the whole scene directly determines the illumination of individual objects in the scene and therefore the reflected light of the objects received by imaging apparatus such as video camera.
Ambient illumination is illumination from any light source except the special lights used specifically for imaging an object. For example, ambient illumination is the illumination due to light sources occurring in the environment such as the sun outdoors and room lights indoors.
Glare or specular reflection is the high amount of light reflected off a shiny (specular, exhibiting mirror-like, possibly locally, properties) object. The color of the glare is mostly that of the illuminating light (as opposed to the natural color of the object).
A feature of an image is defined as any property of the image, which can be computationally extracted. Features typically have numerical values that can lie in a certain range, say, R0-R1. In prior art, histograms are computed over a whole image or windows (sub-images) in an image. A histogram of a feature of an image is a numerical representation of the distribution of feature values over the image or window. A histogram of a feature is developed by dividing the feature range, R0-R1, into M intervals (bins) and computing the feature for each image pixel. Simply counting how many image or window pixels fall in each bin gives the feature histogram.
Image features include, but are not limited to, color and texture. Color is a two-dimensional property, for example Hue and Saturation or other color descriptions (explained below) of a pixel, but often disguised as a three-dimensional property, i.e., the amount of Red, Green, and Blue (RGB). Various color descriptions are used in the prior art, including (1) the RGB space; (2) the opponent color space; (3) the Munsell (H,V,C) color space; and, (4) the Hue, Saturation, and Intensity (H,S,I) space. For the latter, similar to the Munsell space, Hue refers to the color of the pixel (from red, to green, to blue), Saturation is the "deepness" of the color (e.g., from greenish to deep saturated green), and Intensity is the brightness, or what the pixel would look like in a gray scale image.
Texture, on the other hand, is an visual image feature that is much more difficult to capture computationally and is a feature that cannot be attributed to a single pixel but is attributed to a patch of image data. The texture of an image patch is a description of the spatial brightness variation in that patch. This can be a repetitive pattern (of texels), as the pattern on an artichoke or pineapple, or, can be more random, like the pattern of the leaves of parsley. These are called structural textures and statistical textures, respectively. There exists a wide range of textures, ranging from the purely deterministic arrangement of a texel on some tesselation of the two-dimensional plane, to "salt and pepper" white noise. Research on image texture has been going on for over thirty years, and computational measures have been developed that are one-dimensional or higher-dimensional. However, in prior art, histograms of texture features are not known to the inventors.
Shape of some boundary in an image is a feature of multiple boundary pixels. Boundary shape refers to local features, such as, curvature. An apple will have a roughly constant curvature boundary, while a cucumber has a piece of low curvature, a piece of low negative curvature, and two pieces of high curvature (the end points). Other boundary shape measures can be used.
Some prior art uses color histograms to identify objects. Given an (R,G,B) color image of the target object, the color representation used for the histograms are the opponent color: rg=R-G, by=2*B-R-G, and wb=R+G+B. The wb axis is divided into 8 sections, while rg and by axes are divided into 16 sections. This results in a three-dimensional histogram of 2048 bins. This system matches target image histograms to 66 pre-stored reference image histograms. The set of 66 pre-stored reference image histogram is fixed, and therefore it is not a trainable system, i.e., unrecognized target images in one instance will not be recognized in a later instance.
U.S. Pat. No. 5,060,290 to Kelly and Klein discloses the grading of almonds based on gray scale histograms. Falling almonds are furnished with uniform light and pass by a linear camera. A gray histogram, quantized into 16 levels, of the image of the almond is developed. The histogram is normalized by dividing all bin counts by 1700, where 1700 pixels is the size of the largest almond expected. Five features are extracted from this histogram: (1) gray value of the peak; (2) range of the histogram; (3) number of pixels at peak; (4) number of pixels in bin to the right of peak; and, (5) number of pixels in bin 4. Through lookup tables, an eight digit code is developed and if this code is in a library, the almond is accepted. The system is not trainable. The appearances of almonds of acceptable quality are hard-coded in the algorithm and the system cannot be trained to grade almonds differently by showing new instances of almonds.
U.S. Pat. No. 4,735,323 to Okada et al. discloses a mechanism for aligning and transporting an object to be inspected. The system more specifically relates to grading of oranges. The transported oranges are illuminated with a light within a predetermined wavelength range. The light reflected is received and converted into an electronic signal. A level histogram divided into 64 bins is developed, where EQU Level=(the intensity of totally reflected light)/(the intensity of green light reflected by an orange)
The median, N, of this histogram is determined and is considered as representing the color of an orange. Based on N, the orange coloring can be classified into four grades of "excellent,""good,""fair" and "poor,"or can be graded finer. The systems is not trainable, in that the appearance of the different grades of oranges is hard-coded into the algorithms.
The use of gray scale and color histograms is a very effective method for grading or verifying objects in an image. The main reason for this is that a histogram is very compact representation of a reference object that does not depend on the location or orientation of the object in the image.
However, for image histogram-based recognition to work, certain conditions have to be satisfied. It is required that: (1) the size of the object in the image is roughly known, (2) there is relatively little occlusion of the object (i.e., most of the object is in the image and not obscured by other objects), (3) there is little difference in illumination of the scene of which the images (reference and target images) are taken from which the reference object histograms and target object histograms are developed, and (4) the object can be easily segmented out from the background or there is relatively little distraction in the background. Under these conditions, comparing a target object image histogram with reference object image histograms has been achieved in numerous ways in the prior art.