The present invention relates to a specified image-area extracting method and a specified image-area extracting device, and more particularly, to a method of and a device for extracting, from an input image, a remarkable portion, e.g., a person's skin-color portion including their face, arms and the like as a specified image-area, which method and device are usable for a video processing device, e.g., for producing video information suitable to use in video telephones and video conferences.
Up to now, face-area extraction by detecting a location of a face of a human figure in an input image and processing the detected area with priority has been proposed as an effective method to improve the quality of a video image. It is easily understood that a face portion is remarkable in an image displayed on a display screen of, e.g., a video telephone or a video conference apparatus in the field of video communication. It is, therefore, preferred to extract a face area from an image and suitably encode and quantize the extracted face area to improve the image quality.
Japanese Laid-Open Patent Publication No. 5-165120, which is referred to as a first example of a conventional face-area extracting method, discloses sequential steps of a face-area extracting process according to a featured image-data extracting method. In a first Step, noise components are removed from input data R, G, B. In the second Step, data R, G and B are converted to an H (hue) value, an S (saturation) value and an L (luminance) value. In the third Step, a two-dimensional histogram showing hue value and saturation value is prepared by using a coordinate system with orthogonal axes for hue value (H), saturation value (S) and the number of pixels. In the fourth Step, the determined two-dimensional histogram is clustered by cutting-out small peaks therefrom by a plane parallel to the coordinate plane and detecting small peaks. In the fifth Step, a large number of pixels are clustered on the basis of the detected small peaks cut-out from the two-dimensional histogram, and surrounding pixels are integrated together to form an integrated area. The input image scene (frame) is divided into areas according to the integrated area. Prospective areas of a person's face are extracted from the divided image. In the sixth Step, face areas are estimated from the extracted prospective face areas, and then data sets R, G and B for estimated face areas are outputted.
Similar techniques for determining the above-mentioned two-dimensional histogram of parameters H and S and dividing and extracting peaks corresponding to respective areas of an image are also proposed in Japanese Laid-Open Patent Publication Nos. 6-160993, 6-67320, 5-158164, 5-100328, 4-346334, 4-346333 and 4-346332. Another (second) example of a conventional area-extracting method is as follows:
A two-dimensional histogram is plotted for color difference values U and V, where a face area is included in a respective area. Therefore, the respective area is extracted and outputted as a face area. The above-mentioned processing is conducted on all video frames.
The first example of a conventional face-area extraction method divides an input image into areas by first preparing a two-dimensional histogram of hue and luminance and then extracting a peak of histogram frequencies (the number of pixels). This method, however, encounters such a problem that it is rather difficult to decide what peak corresponds to a skin color area: in practice, the white race and the black race have different hue values, i.e., erroneous face-area extraction may arise depending upon the different races.
The second example of a conventional face-area extracting method involves the following problem:
Face areas of all races can not be limited to two kinds of color difference distributions and are of at least three kinds of color difference (the white race, the black race and the yellow race) because the color difference has a close correlation with luminance (brightness). Accordingly, this method can not extract a face area depending upon the race. When the method is applied to video processing, a color space of all pixels in each frame must be transformed by performing a large number of time-consuming operations.