1. Field
One embodiment of the present invention relates to an image processing apparatus and an image processing method, which recognize an object in an image.
2. Description of the Related Art
Conventionally, there has been proposed an image processing apparatus which recognizes a specific object from an image which is captured by a camera. Jpn. Pat. Appln. KOKAI Publication No. 2007-87089, for instance, discloses a gesture recognition apparatus which recognizes, from an input image, a gesture by the hand.
This gesture recognition apparatus executes a process of detecting a region of the hand from an input color image, and finding the position of the hand. As regards the initially input image, a flesh color likelihood map is prepared by using a flesh color model which is prestored in a flesh color model database, and a plurality of candidate regions of the region of the hand of a predetermined size are set at random positions of the flesh color likelihood map. A candidate region, where a mean value of the flesh color likelihood value is a predetermined value or more, is recognized as the region of the hand, and the position of the hand is found from the hand region with a weighted mean value of the flesh color likelihood value of pixels in the region of the hand. Further, a color histogram of pixels of the region of the hand is prepared and stored as a reference value histogram. As regards a frame image which is input after the frame image from which the position of the hand has first been found, candidate regions of the hand of a predetermined size are randomly set in the input image, and the degree of similarity between the color histogram, which has been found with respect to each candidate region, and the reference color histogram is examined. Then, a process is executed to find the position of the hand by using the candidate region with a high degree of similarity as the region of the hand.
As described above, in the gesture recognition apparatus disclosed in KOKAI Publication No. 2007-87089, with respect to the first input frame image, the position of the hand is found on the basis of color information (flesh color likelihood value) and is stored as the reference color histogram. As regards a frame image which is input after the first frame image, candidate regions of the hand of a predetermined size are set, and the degree of similarity between the color histogram, which has been found with respect to each candidate region, and the reference color histogram is examined. Then, the candidate region with a high degree of similarity is found as the region of the hand.
Specifically, in the conventional gesture recognition apparatus, the position of the hand for preparing the reference color histogram is found on the basis of the color information (flesh color likelihood value). Normally, the color of a color image varies depending on the photographing environment (e.g. the condition of the camera operation, the kind of illumination, and a change in position of the light source), even when the same object (e.g. the hand) is photographed. Thus, there is a concern that the position of the hand for preparing the reference color histogram cannot surely be found due to the variation in color.
On the other hand, if the position of the hand is to be detected without using the color information, there is a case in which a plurality of patterns similar to the shape of the hand are present in the image, and it is difficult to detect only the hand that is the object of recognition.
In addition, in the conventional gesture recognition apparatus, the degree of similarity between the reference color histogram, which is based on the first input frame image, and the color histogram, which is found from a subsequent frame image, is examined. Thus, if the photographing environment at the time of photographing the first frame image varies, the color histogram, which is found from the subsequent frame image, also varies and it becomes difficult to correctly examine the similarity.
Moreover, since the position of the “hand” is found by using the flesh color model that is stored in the flesh color model database, that is, the absolute color information, it would be difficult to precisely recognize the “hand” unless the flesh color model database (color information) is prepared in consideration of individual differences between various persons, e.g. persons with the black skin or persons with the white skin.