1. Technical Field
The invention is related to a computer-implemented object recognition system and process for identifying people and objects in an image of a scene, and more particularly, to such a system and process employing color images, color histograms, and techniques for compensating for variations in illumination in the scene, as well as a employing a sum of match qualities approach to best identify each of a group of people and objects in the image of the scene.
2. Background Art
Object recognition in images is typically based on a model of the object at some level of abstraction. This model is matched to an input image which has been abstracted to the same level as the model. At the lowest level of abstraction (no abstraction at all), an object can be modeled as a whole image and compared, pixel by pixel, against a raw input image. However, more often unimportant details are abstracted away, such as by using sub-templates (ignoring background and image position), normalized correlation (ignoring illumination brightness), or edge features (ignoring low spatial frequencies). The abstraction itself is embodied in both the representation of the object and in the way it is matched to the abstracted image. For instance, Huttenlocher et al. [1] represent objects as simple edge points and then match with the Hausdorff distance. While the edge points form a completely rigid representation, the matching allows the points to move nonrigidly.
One interesting dimension of the aforementioned abstraction is rigidity. Near one end of this dimension are the several object recognition algorithms that abstract objects into a rigid or semi-rigid geometric juxtaposition of image features. These include Hausdorff distance [1], geometric hashing [2], active blobs [3], and eigenimages [4, 5]. In contrast, some histogram-based approaches abstract away (nearly) all geometric relationships between pixels. In pure histogram matching, e.g. Swain & Ballard [6], there is no preservation of geometry, just an accounting of the number of pixels of given colors.
Abstracting away rigidity is attractive, because it allows the algorithm to work on non-rigid objects and because it reduces the number of model images necessary to account for appearance changes. For example, color histograms are invariant to translation and rotation about the viewing axis, and change only slowly under change of angle of viewing, change in scale, and occlusion. Because histograms change slowly with view, a three-dimensional object can be adequately represented by a small number of histograms.
However, the use of histograms for object recognition systems is not without drawbacks. One of these drawbacks involves identifying each of a group of people in an image of a scene. Typically, the aforementioned matching of models to an input image involves the use of a threshold where a model is deemed to match a portion of the input image when their similarity is above this threshold. The threshold is usually chosen so that it is reasonably certain that a portion of the input image actually corresponds to the person or object in the “matching” model. However, it is not chosen to be so high that anticipated variations in the abstractions of the same person or object between the model and the input image cannot be accounted for in the matching process. This thresholding scenario can present a problem though when it is desired that more than one person or object be identified in the input image. Essentially, it is possible that the abstractions of two different people or objects from the input image may both match the abstraction of a single model in that the aforementioned threshold is exceeded when each is compared to the model. Thus, there is a question as to the actual identity of each of these people or objects.
Another particularly troublesome drawback to the use of histograms in object recognition systems is caused by the fact that illumination conditions typically vary from place to place in a scene. Variations in illumination can significantly alter a histogram of an image as the apparent colors tend to change. Thus, a histogram created from an image of a person or object at a first location under one lighting condition may not match a histogram created from an image of the same person or object at another location in the scene which is under different lighting conditions. If the deviation is severe enough, it will not be possible to recognize that the two histograms are associated with the same person or object. Lighting conditions can also change in a scene over the course of a day. Thus, even if a person or object is in the same location for extended periods of time, the illumination conditions, and so the computed histograms, might change. Here again it may become impossible to recognize that the histograms belong to the same person or object if the change in illumination is significant. The system and process according to the present invention introduces some unique techniques to the use of histograms for object recognition that mitigate the above described issues.
It is noted that in the preceding paragraphs the description refers to various individual publications identified by a numeric designator contained within a pair of brackets. For example, such a reference may be identified by reciting, “reference [1]” or simply “[1]”. Multiple references will be identified by a pair of brackets containing more than one designator, for example, [4, 5]. A listing of the publications corresponding to each designator can be found at the end of the Detailed Description section.