The present disclosure relates to an image processing apparatus, an image processing method, and a computer program by which an object shape including a deformed part in an inputted image is recognized and estimated, and particularly relates to an image processing apparatus, an image processing method, and a computer program by which a plurality of shape information pieces acquired in advance are resolved into basis spaces, and any one of shapes of objects included in an inputted image is recognized and estimated by performing projection and back projection onto the basis spaces.
An “Active Shape Model (ASM)” and an “Active Appearance Model (AAM)” are known as a technique of modeling a visual event. These techniques use preliminary learning performed in such a manner that, through a statistical analysis such as a principal component analysis (PCA) or an independent component analysis (ICA), a plurality of given shape information pieces (positions (coordinates) of a plurality of feature points defined in a face image, pixel values (such as brightness values) or the like) are resolved into (projected onto) a plurality of basis spaces and are registered (for example, see: T. F. Cootes and C. J. Taylor, “Active shape models”, In D. Hogg and R. Boyle, editors, 3rd British Machine Vision Conference, pages 266-275, Springer-Verlag, September 1992; and T. F. Cootes, G. J. Edwards, and C. J. Taylor, “Active Appearance Models”, in Proc. European Conference on Computer Vision 1998 (H. Burkhardt & Neumann Ed. s). Vol. 2, pp. 484-498, Springer, 1998). In addition, the techniques make it possible to represent a certain shape by combining (performing back projection on) the registered basis spaces and thus to recognize and estimate a shape of an object including a deformed part such as a face. Moreover, the ASM/AAM make it possible to represent the face in a deformed manner, for example, the face orientation of a person is changed, or the degree of opening of any of the eyes or the mouse is changing.
For example, there is proposed an image processing apparatus which sets a shape model and a texture model by using an AAM in the following manner. A specific feature model showing a specific feature of a face texture is set independently, and a correction texture model is set for textures other than the specific feature. Then, the specific feature model and the corrected texture modes are combined with each other to thereby set the texture model with high accuracy and efficiency (for example, see JP 2010-244321A).
There is also proposed an image processing apparatus which locates a feature part of a face included in an image by using the AAM in the following manner. A texture correction is applied to at least one of a reference face image and a target face image so that face textures of the reference face image and target face image are made close to each other, and then feature part reliability is calculated based on the reference face image and the target face image one of which has undergone the texture correction (for example, see JP 2010-244318A).
The ASM/AAM has an advantage that repeating the projection and the back projection of any shape distribution leads to an output close to a shape registered in advance, that is, a shaped output. In addition, the ASM/AAM make it possible to lightly and quickly implement processing of tracking or fitting of a chief feature part (a face part) from a face region included in an inputted image.
However, the method such as the ASM/AAM by which a plurality of shape information pieces acquired in advance are resolved into basis spaces, and a certain shape is represented by combining the basis spaces has the following disadvantages.
(1) When a part of a shape (feature points) of a target object in an inputted image lies at a position largely deviating from an original position, an entire shape of the object is influenced by the deviation value and is displaced.
(2) When positions of a shape (feature points) are estimated by using a local feature quantity in an image, it is difficult to locate a region poor in features such as an edge or a texture.