1. Field of the Invention
The present invention relates to an image processing method and apparatus for performing processing such as image recognition or image transformation.
2. Description of the Related Art
When image processing is applied to an image, the processing may be performed on only a selected part of the image. Alternatively, as the case may be, each region of the image can be subjected to a different image processing operation. In either case, it is necessary to determine which region of an image is to be processed using a given process operation.
In various software applications, a user can select a region of an image where an image processing is to be performed. For example, in an image editing application, such as Photo Editor® of Microsoft Corporation, a user can specify a region to be processed using a mouse prior to applying image processing such as negative/positive inversion to the image.
Alternatively, some of the image recognition methods carry out refining of a candidate region by a filtering process having a light processing load before starting a recognition process that causes a heavy processing load. In “HMM-based Sign Language Recognition using Hand Gesture and Hand Posture” (Yanagi, Yagyu, Tokuda, Kitamura, Proceedings of the Institute of Electronics, Information and Communication Engineers (IEICE) General Conference (Vol. 2004)), a skin color region in an image is extracted, a center point of a continuous skin color region is defined as a candidate position, and then, a hand posture can be obtained.
On the other hand, there is a pattern recognition method in which all portions where skin color pixels exist are defined as candidate positions, as discussed in Japanese Patent Application Laid-open No. 2002-312796. In this method, high precision pattern detection is carried out after the candidate position is obtained. There are a variety of methods for obtaining the skin color pixels, some of which are discussed in “Analysis of Human Skin Color Images for a Large Set of Color Spaces and for Different Camera Systems”, (Terrillon, Pilpre, Niwa, Yamamoto, IAPR Workshop on Machine Vision Applications (MVA 2002)), for example.
In the conventional technique described above, a user is to specify a region to which image processing is applied. Such a method is effective in many cases where it is desired to reflect the user's intention. However, this method is not suitable in a case where automatic sensing is desired as in image recognition.
Further, in a case where a hand is detected, if only one point in a continuous skin color region is defined as a hand candidate position, it is highly probable that detecting of a hand will fail. This is because a natural image generally has a skin color pixel also in a portion other than a hand, and if there is a skin color pixel around a hand, the position of a candidate point may be shifted depending on a distribution state of skin colors.
On the other hand, if all portions where skin color pixels exist are defined as face candidate positions, refining of the face candidate positions is often not effectively carried out. In particular, in the case where there is a wall of a single color similar to a skin color, an entire surface of the wall is defined as a face candidate position. Consequently, face detection processing is frequently executed, which is not preferable from the viewpoint of a processing speed.
Therefore, a method is desired which refines candidates more effectively than conventional methods.