This invention relates to the field of image processing and computer vision. More specifically, the invention relates to an apparatus and method for taking images of objects independent of the background and/or of the ambient illumination even if these objects are surrounded by plastic that can be somewhat translucent.
There are various prior art image processing and computer vision systems which acquire and/or process images of a scene. (Generally, a scene includes a background and one or more objects that are of interest.) Typically, in these systems, an analog image from a camera (image acquisition unit) is converted to a discrete representation by dividing the picture into a fixed number of locations called picture elements, or pixels, and quantizing the brightness or color of the image at those picture elements into a fixed number of values. Usually, color is represented as three different images, the red, the green and the blue image where the color of the pixels is quantized in a fixed number of values. The red, green and blue are referred to as the color channels or the spectral bands. Thus, much of the prior art develops a digital image of the actual image or scene and then processes the digital image using a computer. This processing, also called image processing or computer vision, includes modifying the scene image or obtaining properties from the scene image such as the identity or location of the objects in the scene.
Objects in the scene are illuminated when light falls on the object(s). Ambient illumination is the illumination due to light sources occurring in the environment such as the sun outdoors, room lights indoors, or a combination of artificial light and sunlight indoors. In general, the light reflected from an object patch, resulting in a brightness of the corresponding image pixels, is a mixture of a matte plus a glare (or specular) component, although at a given image pixel either the matte component or the glare component tends to dominate. The color of a matte reflection is a function of the natural color of the object and the color of the illuminating light (in the spectral domain, the illumination function and the reflection function are multiplied). Specular reflections (also called glare) are the bright highlights reflected off the surface of a shiny object. The color of the glare is mostly the color of the illuminating lights (as opposed to the natural color of the object).
The glare component is mostly unrelated to the object""s intrinsic surface properties and, therefore, is of little use for object segmentation or recognition purposes. The matte reflection, on the other hand, is a function of the color of the object as well as the illuminating light. To produce a image which is more characteristic of an object""s intrinsic color it is desirable to remove or suppress the specular component of reflection. One way to do this is by the use of polarizers. Because of diffusion in the surface layer of an object, matte reflection is not polarized. Specular reflection, on the other hand, is often polarized, especially as the viewing angle becomes more tangential to the surface. Thus adding a properly oriented polarizing filter to the camera will remove a certain portion of the glare. If all the illumination can be controlled, even better results can be obtained by deliberately polarizing the outgoing, illuminating light and then only sensing returned light with an altered polarization angle.
For these reasons an object""s color, an important object property for object recognition, depends on the ambient light. In order to compensate for this effect, prior art solutions use the reflection of a white or gray patch in the scene. Color correction is then performed by transforming the image so that the color of the gray patch is transformed to a standard predetermined value. For instance, the patch image color spectrum could be transformed such that the spectrum of the patch image is uniform in the red, green and blue channels (spectral bands) and has a certain, preset reflectance. Indeed, the whole image including the object image is transformed in such a fashion. A representation of the object""s color is thus represented for recognition purposes by its image color spectrum normalized by the standard color spectrum of the image of the gray patch. Such techniques, known as color constancy, are well known. An early example for gray scale images can be found in U.S. Pat. No. 4,314,281 to Wiggens and Elie, which is hereby incorporated by reference in its entirety.
Now consider the case where the object to be recognized is surrounded by a plastic bag. It is assumed that the transparency of the bag is high enough that a human can recognize the object. A part of the scene image (e.g., where the bag is flat) contains object image portions as would be seen as if there were no surrounding bag. However, even for those image parts the illuminating light passes through the bag then the reflected light passes through the bag again. Thus the color of the reflected light by any subtle tint to the bag, as well as by the bag""s intrinsic diffuse reflectance properties is influenced to an extent depending on the level of translucency and tint of the bag. Other parts of the bag may completely obscure the underlying object image due to specular reflection off the bag surface and, to a lesser extent, due to fact that the bag is seen as opaque depending on the surface normal of the bag or folds in the bag. These phenomena make it difficult to gauge the true surface properties of an item enclosed by a bag.
During the image processing of the scene, the object (or objects) that is (are) of interest is (are) imaged along with the scene surroundings. These surroundings are called the background. The background is usually behind the object(s) of interest. In some types of image processing, it is necessary to separate the object(s) image from the background image of the scene. This separation is called figure/ground separation or segmentation. In such applications it is important that the segmented foreground portion accurately represents the properties of the object to be identified, and not be contaminated by illumination or other environmental artifacts.
This figure/ground separation is most often performed for the purposes of object recognition. U.S. Pat. No. 5,546,475 to Bolle et al. gives an example, where in combination with the segmentation techniques of U.S. Pat. No. 5,631,976 to Bolle et al., the object(s) in the segmented image are recognized using color features (in combination with other features). A segmentation of an image, may, therefore, be denoted as a mapping s of pixels (x, y) into some space s, e.g., S: (x, y)xe2x86x92s, where S(x, y) is set to some value X if pixel (x, y) is not a part of the segment, and S(x, y) is set to the original pixel value I(x, y) if (x, y) is part of the segment. An alternate segmentation of an image, could be a mapping (x, y)xe2x86x92{0, 1}, where an image point (x, y) is labeled xe2x80x981xe2x80x99 if (x, y) is part of the segment and xe2x80x980xe2x80x99 otherwise. Other variations are also possible, where s=[0, 1], the membership of pixel (x, y) of the segmentation is expressed as a degree of membership. The set s could also take on a set of n (greater than two) discrete numbers.
Figure/ground separation of some sort is required when using computer vision technology to recognize produce (fruit and vegetables) at the point of sale (POS) in supermarkets and grocery stores. The ability to automatically recognize produce at the checkout counter has many advantages, among which:
There is no need to affix the PLU (price lookup) stickers to the produce.
There is less need for prepackaging the produce, thereby saving solid waste.
The checkout of produce will be speedier because the checkers do not have to recall or lookup the PLU numbers.
Produce inventory control can be done more accurately.
Pricing can be done more consistently and accurately.
Allows more convenient self-checkout by customers.
Sweethearting of produce (checkers giving away produce to friends and family) is harder.
The overall losses (shrinkage) of produce will be reduced.
Typically, such produce items are enclosed in plastic bags by the customer and it is undesirable to require the customer or checker to remove these bags before performing recognition. Similarly, computer vision technology can be used for recognition of other items sold in bulk, such as, bread, candies, etc. which are also usually enclosed by bags and hence present the same problems.
Prior art image processing systems cannot easily separate objects of interest from the background of the scene. For example, there are systems which inspect or recognize parts in an assembly line from images of those parts. There are also special effects systems which mix the image of actors with special backgrounds which may be created separately by computers. These systems obtain an image of the object amenable to processing by presenting the object against a background which is readily and simply distinguishable from the object. For instance, part inspection systems may image the parts against a black or white surface (using techniques such as grazing illumination, dark field imaging, or intensity thresholding against a retro-reflective background). Special effects systems usually require the actors to be imaged before a blue or green surface (called xe2x80x9cmattingxe2x80x9d, xe2x80x9cchroma-keyingxe2x80x9d, xe2x80x9cblue screeningxe2x80x9d, or the Ultimatte process). These and other systems will fail if the background is arbitrary and not specially controlled. One such system is the Ultimatte system as described in U.S. Pat. No. 4,625,231 to Vlahos, which is herein incorporated by reference in its entirety.
Another well-known approach for less-uniform backgrounds is to pixel-wise subtract an image of the background alone from an image containing the background plus an object of interest. General purpose background subtraction methods can be found in
D. Ballard and C. Brown, Computer Vision, pp. 72-73. Prentice-Hall: New Jersey, 1982.
This reference is incorporated herein in its entirety. Image processing and computer vision techniques for background subtraction rely on methods that somehow derive the background image from the original image. One sophisticated background model is to use a temporal low-pass variant of the original image constructed from an unlabelled sequence of images. In the current (POS) application, however, the system has access to images, Fb, of the background acquired when the objects surrounded by the plastic bag are not in the camera""s field of view. The simplest method for background subtraction is then, Fn=Fxe2x88x92Fb, where F is the original image. However, this simple method has a number of problems. For those pixels x where there is only plastic bag visible, F(x) is not equal to Fb(x) so these pixels would be counted as foreground. Yet the most informative foreground image, Fn, should only contain pixels corresponding to the object.
Also, some prior art systems have difficulty determining object properties in varying ambient light. For example, many image processing and computer vision systems work by measuring the color or intensity in the image. These color and intensity measurements depend critically on the light illuminating the imaged object and may fail if the object is presented in different ambient light. For these systems the usual solution is to enclose the object in a specially lighted chamber, or carefully control all the lights in the space where the image is taken (i.e., all the lights on the factory floor or in the studio).
Glare reflected from shiny surfaces also presents problems that are difficult to solve for many prior art image processing systems, especially glare from metallic industrial parts. Image processing and computer vision systems have difficulty imaging shiny surfaces such as glass plates or metallic objects due to the glare generated by light reflecting off these shiny surfaces. This is because glare reflected into the imaging systems obscures the object, masks certain surface features, or is interpreted as an intentional mark. In such circumstances, a segmentation system will often falsely omit part of the object due to these highlights.
Prior art segmentation techniques fail on images of objects that are surrounded by plastic bags, primarily because they mark all of the bag as foreground. The above described artifacts introduced by the bag""s light transmission and light reflection properties severely impair the segmentation techniques. As discussed above, parts of the object image are simply obscured due to the specular reflection of the bag. This results in regions that contain holes, assuming the specular regions are detected as not being part of the object and removed. While this might be good for surface properties, it can severely distort the overall shape of an object. Furthermore, parts of the object boundaries can also obscured by these specular reflections, which typically results in false and wobbly boundaries when using prior art boundary finding techniques.
There are also other imaging artifacts due to the surrounding plastic bag that impair object recognition. Not all of these directly affect segmentation. One effect is the introduction of false image texture because of two causes: the scattered pattern of specular patches, and the fact that light transmission properties of the bag vary over its surface. The resultant object image is the true object image multiplied by a varying attenuation function plus a nonlinear function that represents the bag""s specularity. There is also a subtle imaging effect due to increased inter-reflection. After the illuminating light enters the plastic bag it may then bounce around between the inner surface of the plastic bag and the object surfaces. This means that in local areas of the image the true illuminant is composed of not only the ambient sources, but also the photons reflected from nearby colored surfaces (the xe2x80x9cbuttercupxe2x80x9d effect). Effectively, the color of the illuminating light has changed locally and, hence, the color of the light reflected from the object changes.
U.S. Pat. No. 5,631,976 to Bolle et al. proposes a method and apparatus for segmenting images of an object into object image and background image by controlling a light source to illuminate the object so that the light is brighter in one scene image than in another scene image. The method also considers objects that are surrounded by plastic bags that may be somewhat translucent. However, this system achieves color constancy by enclosing the image input device and the light source in an opaque box with an opening through which the input device can view and the light can illuminate the object. This largely eliminates the effects of ambient light, but that means it is generally not possible to retrofit existing installations due to the large size of the box and other geometric constraints. Furthermore, customers and operators may find the flashing light distracting.
Therefore a first object of the present invention is an improved apparatus and method for imaging objects independently and separately of the background.
A further object of this invention is an improved apparatus and method for imaging and segmenting objects independent of background, ambient illumination and glare.
Another object of this invention is an improved apparatus and method for imaging and segmenting objects independent of background, ambient illumination, glare and other imaging artifacts due to a surrounding plastic bag.
This invention describes a system and method for segmenting an object image from a background image. The image processing for segmentation, novelly handles images that are acquired when the scene is illuminated by unknown ambient light sources. Hence, no special illumination of the scene is required. An image of the scene containing the object of interest plus a separate image of the background scene are captured by an image input device. These images are first color corrected, using the image of a gray patch that is visible in the images, so that the gray patch in both images has some standard gray value. A further (prior art) transform converts the images from red, green, blue formats into a hue, saturation and intensity representations. The two images are then compared on a pixel-by-pixel basis in the hue, saturation and intensity domain. For this, there is a sequence of specific tests to be performed on the HSI values of a pixel in the foreground image in order to compare them to the HSI values of the same pixel in the background image. These tests determine whether an image pixel is a foreground pixel or not. Further tests are executed for the special case where the scene object is surrounded by a plastic bag.