1. Technical Field
The invention is related to a computer-implemented object recognition system and process for identifying people and objects in an image of a scene, and more particularly, to such a system and process employing color images, color histograms, and techniques for compensating for variations in illumination in the scene, as well as a employing a sum of match qualities approach to best identify each of a group of people and objects in the image of the scene.
2. Background Art
Object recognition in images is typically based on a model of the object at some level of abstraction. This model is matched to an input image which has been abstracted to the same level as the model. At the lowest level of abstraction (no abstraction at all), an object can be modeled as a whole image and compared, pixel by pixel, against a raw input image. However, more often unimportant details are abstracted away, such as by using sub-templates (ignoring background and image position), normalized correlation (ignoring illumination brightness), or edge features (ignoring low spatial frequencies). The abstraction itself is embodied in both the representation of the object and in the way it is matched to the abstracted image. For instance, Huttenlocher et al. [1] represent objects as simple edge points and then match with the Hausdorff distance. While the edge points form a completely rigid representation, the matching allows the points to move nonrigidly.
One interesting dimension of the aforementioned abstraction is rigidity. Near one end of this dimension are the several object recognition algorithms that abstract objects into a rigid or semi-rigid geometric juxtaposition of image features. These include Hausdorff distance [1], geometric hashing [2], active blobs [3], and eigenimages [4, 5]. In contrast, some histogram-based approaches abstract away (nearly) all geometric relationships between pixels. In pure histogram matching, e.g. Swain and Ballard [6], there is no preservation of geometry, just an accounting of the number of pixels of given colors.
Abstracting away rigidity is attractive, because it allows the algorithm to work on non-rigid objects and because it reduces the number of model images necessary to account for appearance changes. For example, color histograms are invariant to translation and rotation about the viewing axis, and change only slowly under change of angle of viewing, change in scale, and occlusion. Because histograms change slowly with view, a three-dimensional object can be adequately represented by a small number of histograms.
However, the use of histograms for object recognition systems is not without drawbacks. One of these drawbacks involves identifying each of a group of people in an image of a scene. Typically, the aforementioned matching of models to an input image involves the use of a threshold where a model is deemed to match a portion of the input image when their similarity is above this threshold. The threshold is usually chosen so that it is reasonably certain that a portion of the input image actually corresponds to the person or object in the xe2x80x9cmatchingxe2x80x9d model. However, it is not chosen to be so high that anticipated variations in the abstractions of the same person or object between the model and the input image cannot be accounted for in the matching process. This thresholding scenario can present a problem though when it is desired that more than one person or object be identified in the input image. Essentially, it is possible that the abstractions of two different people or objects from the input image may both match the abstraction of a single model in that the aforementioned threshold is exceeded when each is compared to the model. Thus, there is a question as to the actual identity of each of these people or objects.
Another particularly troublesome drawback to the use of histograms in object recognition systems is caused by the fact that illumination conditions typically vary from place to place in a scene. Variations in illumination can significantly alter a histogram of an image as the apparent colors tend to change. Thus, a histogram created from an image of a person or object at a first location under one lighting condition may not match a histogram created from an image of the same person or object at another location in the scene which is under different lighting conditions. If the deviation is severe enough, it will not be possible to recognize that the two histograms are associated with the same person or object. Lighting conditions can also change in a scene over the course of a day. Thus, even if a person or object is in the same location for extended periods of time, the illumination conditions, and so the computed histograms, might change. Here again it may become impossible to recognize that the histograms belong to the same person or object if the change in illumination is significant. The system and process according to the present invention introduces some unique techniques to the use of histograms for object recognition that mitigate the above described issues.
It is noted that in the preceding paragraphs the description refers to various individual publications identified by a numeric designator contained within a pair of brackets. For example, such a reference may be identified by reciting, xe2x80x9creference [1]xe2x80x9d or simply xe2x80x9c[1]xe2x80x9d. Multiple references will be identified by a pair of brackets containing more than one designator, for example, [4, 5]. A listing of the publications corresponding to each designator can be found at the end of the Detailed Description section.
This invention is directed toward an object recognition system and process that identifies people and objects depicted in an image of a scene. In general, this system and process entails first creating, by some not necessarily automatic means, model histograms of the people and objects that it is desired to identify in the image. Then, the image is segmented to extract regions which likely correspond to the people and objects being identified. In our terminology, a xe2x80x9cmodel histogramxe2x80x9d is a stored histogram that is associated with a certain person or thing, and it tells what the person or thing is expected to look like. A xe2x80x9cregion histogramxe2x80x9d is a histogram extracted from the actual xe2x80x9clivexe2x80x9d image of the scene. A region histogram is computed for each of the extracted regions, and a match quality indicator of the degree of similarity between each extracted region histogram and each of the model histograms is computed. The extracted regions having a histogram that exhibits a degree of similarity to one of the model histograms which exceeds a prescribed threshold is designated as corresponding to the person or object associated with that model histogram. In one embodiment of the present invention, this designation is accomplished as follows. The largest match quality indicator is identified for each extracted region, and the region is designated as being the person or object associated with the model histogram used in part to compute the largest match quality indicator whenever the indicator exceeds the aforementioned threshold. In the case where extracted regions have histograms that do not exhibit a degree of similarity to any of the model histograms which exceeds the prescribed threshold, the designation technique employed is preferably the same, except that the designation is one of an unknown person or object. In addition, the region histogram computed for any extracted region of the image that is designated as corresponding to a person or object associated with a model histogram is stored as an additional model histogram associated with that person or object.
In an alternate embodiment of the present invention, the designation that an extracted region corresponds to the person or object associated a model histogram is accomplished as follows. First, exclusive combinations of the match quality indicators are formed. Each of these combinations is made up of one indicator associated with each extracted region of the image, and each indicator in the combination is derived from a different model histogram. A combined degree of similarity value is computed for each of the indicator combinations. Preferably, this entails summing the indicators in each combination to produce the combined indicator. The largest of the combined indicators is then identified. The extracted regions having a histogram associated with one of the indicators used to compute the identified largest combined indicator that exceeds the prescribed threshold, are designated as corresponding to the person or object associated with the model histogram used in part to compute that respective indicator. A histogram computed for any extracted region of the image that is designated as corresponding to a person or object associated with a model histogram can also be stored as an additional model histogram, as in the previous embodiment. In addition, any of the remaining extracted regions having histograms associated with the indicators used to compute the largest combined indicator that do not exceed the prescribed threshold can be designated as corresponding to an unidentified person or object.
Preferably, in either embodiment, the system and process is repeated for subsequently generated images of the scene, so that the identity of people and objects can be monitored over time as they move into and about the scene. This makes the action of storing additional model histogram particularly advantageous as it improves the robustness of the object recognition technique. This is because people and objects can be xe2x80x9ccatalogedxe2x80x9d in various parts of the scene and at different times. As discussed previously, illumination differences throughout a scene can affect histograms significantly. Therefore, having model histograms associated with a person or object in different parts of the scene improves the chances of an accurate identification.
The aforementioned image is preferably a color image of the scene, and the model histograms and region histograms are color histograms. When color images and histograms are used, it is preferred that they be created as follows. Model histograms are created by first capturing one or more model images of the people and objects it is desired to identify. Each model image is segmented to extract model regions which correspond to each of the aforementioned people and objects. Then, for each model region, the actual colors exhibited by the pixels of the model region are determined and the overall gamut of actual colors exhibited by the pixels is divided into a series of discrete color ranges, hereinafter referred to as quantized color categories. Each pixel of the extracted model region is respectively assigned to the quantized color category into which the actual color of the pixel falls. Finally, a model color histogram is produced by establishing a count of the number of pixels of each extracted model region assigned to the same quantized color category. The method for computing color histograms for extracted regions of the color image of the scene is identical, and it is preferred that the same quantized color categories be used for each histogram.
In addition, when color images and color histograms are employed in the present object recognition system and process, it is preferred that the degree of similarity between an extracted region histogram and a model histogram be assessed as follows. First, the respective pixel count from each quantized color category of the histogram derived from the extracted region is compared to the pixel count from the corresponding quantized color category of the model histogram. The purpose of this is to identify the smaller of the two counts. These smaller counts are then added together to produce a similarity value. This assessment is repeated for each pair of histogram compared. It is noted that two matching histograms will have a larger similarity value than non-matching histograms because the smallest count from each category will be nearly as large as the larger count, whereas the smaller counts in non-matching histograms are likely to be significantly smaller than the larger value. Thus, the sum of the smaller counts from matching histograms should be larger than the sum of the counts for non-matching histograms. Finally, the similarity value associated with each pair of compared histograms is normalized for each extracted region of the image of the scene. This is accomplished by dividing the similarity value by a maximum possible similarity value to produce the aforementioned match quality indicator.
The present invention also encompasses a refined system and process for identifying people and objects in an image of a scene. This refined process begins with the creation of initial model histograms of the people and objects that it is desired to identify in xe2x80x9clivexe2x80x9d images of the scene. This is preferably accomplished by first dividing one or more prefatory images of the scene into a plurality of cells. Each of the initial model histograms is created from a region extracted from a prefatory image which is known to depict a particular person or object of interest. These initial model histograms are then assigned to the image cell where the centroid of their associated extracted regions reside. The extraction and histogram creation procedures are accomplished in the same manner as discussed above in connection with the description of the basic object recognition process.
Once the initial model histograms are created and assigned to the appropriate cell, the processing of xe2x80x9clivexe2x80x9d images of the scene can begin. Here again the segmentation of the live image to extract regions likely to depict a person or object of interest and the creation of a histogram from each extracted region proceeds as discussed previously. Each live image is then divided into the aforementioned plurality of cells. The centroid of each extracted region is determined and the cell in which it resides is identified. Next, for each xe2x80x9csetxe2x80x9d of model histograms associated with the same person or object, it is determined which cell having one of the model histograms in the set assigned to it is the closest to the identified cell associated with the extracted region. Then, the same assessment techniques of either of the above-described embodiments of the basic process are employed to assess the degree of similarity between the histogram computed for an extracted region and each of the model histograms determined to correspond to a different one of the people or objects of interest and assigned to the cell closest to the identified cell (which of course could be the identified cell itself). If, it is determined that an extracted region""s histogram exhibits a degree of similarity to one of the model histograms which exceeds the aforementioned threshold, then the region""s histogram is designated as corresponding to the person or object associated with that model histogram. The extracted region""s histogram can also be stored as an additional model histogram associated with the designated person or object and assigned to the cell in which the centroid of the corresponding extracted region resides. Each of the aforementioned locations in the image of the scene can thus be associated with its own set of stored model histograms. This would for example account for differences in the lighting conditions at different locations within the scene which could cause an extracted region""s histogram to vary significantly, and so not match the model histogram associated with the person or object. The preferred conditions under which the region histogram is stored as an additional model histogram in the refined system and process will be discussed later in this summary. As with the basic process, when an extracted region""s histogram does not exhibit a degree of similarity to any of the previously identified model histograms, which exceeds the prescribed threshold, it is designated as corresponding to the person or object of unknown identity, and ignored. In addition, as with the basic object recognition technique, it is preferred that the refined system and process be repeated for subsequently generated images of the scene, and that color images and color histograms be employed.
The preferred conditions under which a region histogram is stored as an additional model histogram in the refined system and process are as follows. It is first determined for each extracted region whether a histogram associated with the person or object corresponding to the histogram derived from the extracted region was previously stored and assigned to the cell containing the centroid of the extracted region. If such a histogram was not previously stored and assigned, then the histogram derived from the extracted region is stored as an additional model histogram and assigned to the cell containing the centroid of the extracted region. Whenever it is determined that a histogram associated with the person or object corresponding to the histogram derived from the extracted region was previously stored and assigned to the cell containing the centroid of the extracted region, the following additional process actions can be performed. First, the time when the previously stored histogram was stored and assigned is identified. It is then ascertained whether the previously stored histogram was stored within a prescribed threshold time frame in comparison to the current time. If it is determined that the previously stored histogram was not stored within the prescribed threshold time frame, then the histogram derived from the extracted region is stored as an additional model histogram and assigned to the cell containing the centroid of the extracted region. It is noted that in this storage criteria, more than one histogram could be created and stored for each person at each location. This would account for changes in the lighting conditions at a location over the course of the day.