Many computer vision applications, such as object detecting, recognizing, classifying and tracking rely on significant features of the objects. For example, in face recognition, the features associated with the eyes, nose and mouth are most relevant. In tracking an articulated object, such as a person, the important features are associated with the torso, limbs and head. Typically, the feature is defined by its size, location and descriptor. Because the appearance of the features can change drastically depending on lighting, motion, texture, pose variation, and occlusions, feature-based models are often constructed to improve the processing. Images can also be acquired from different viewpoints, which cause objects to appear to have different properties, such as size and speed, depending on their position in the image and the viewpoint characteristics.
To facilitate the processing arbitrary images of objects, two normalization preprocessing steps are usually performed.
Image Normalization
First, the image is normalized. Image normalization makes the number of pixels and aspect ratios the same in all images, e.g., 40×40 for faces, and 128×64 for bodies. The range of pixel intensity values can also be adjusted by contrast stretching and dynamic range expansion to e.g., 0-255. Colors can also be adjusted.
Object Normalization
Second, the object is normalized to fit in the normalized image. This can be done by making the size, location and orientation of the object consistent in all images. This can be done by scaling, translation and rotation.
However, even though the features may appear to relocate during the prior art image and object normalizations, the relative location and the description of the features within the image or object, with respect to each other, remain fixed.
Deformable and Articulated Objects
Alternative solutions segment a deformable or articulated object into multiple objects. For example, for a human body, the object is segmented into torso, head and limbs objects. Similarly, a face can be a segmented into eye, nose and mouth objects. Those methods then operate on the fixed feature in each object, and displacement of the feature within the object is not an issue, i.e., the features in the objects remain fixed in place. In fact, the entire object segment is usually treated as a feature see Mikolajczyk et al., “Human detection based on a probabilistic assembly of robust part detectors,” Proc. European Conf. on Computer Vision, volume 1, pages 69-81, 2004, Mohan et al., “Example-based object detection in images by components,” IEEE Trans. Pattern Anal. Machine Intell., 23(4):349-360, 2001. Effectively, the feature in each segmented object is processed the same as features in objects Felzenszwalb et al., “Pictorial structures for object recognition,” Intl. J. of Computer Vision, volume 61, 2005; and Felzenszwalb et al., “A discriminatively trained, multiscale, deformable part model,” Proc. IEEE Conf. on Computer Vision, 20089.
In all cases, the prior art features are fixed within objects and as well as segmented objects. In some cases the objects are the features. The features are not displaceable within the objects or segmented objects, as defined herein.
Fixed features work well when the physical features of the object, e.g., the eyes in the case of a face, occupy the same relative physical location and size. However, if the object is deformable or the features are otherwise displaced as in articulated objects, the problem becomes much harder. In fact, object detection is only tractable when the features are fixed, otherwise the search space becomes problematic.
Therefore, it is desired to normalize displaceable features and descriptors within objects.