The present invention relates to encoding a motion picture signal using band compression techniques, and more particularly to detecting an interest image region, for example the image of a speaker using a video telephone, and then encoding only the interest image region.
As a prior approach for encoding a motion picture signal using band compression techniques, "A Color Motion Videophone for the ISDN", Report No. D-233, 1989 Spring Grand Conference, The Institute of Electronics, Information and Communication Engineers of Japan is known. According to this approach, a facial region is detected to generate a map representing the facial region and an image encoding section performs interframe/ inframe adaptive predictive encoding of picture elements of the current frame, using picture elements of the previous frame and current adjacent picture elements. When a picture element to be encoded is in the facial region, the encoding is repeated to the final stage, and otherwise the encoding is stopped at the stage immediately before the final stage.
However, the prior approach still performs course encoding of a background part or a part other than facial region, and then noise related to the background part causes unnecessary information. Further when picture elements are converted from the background part to the facial region between consecutive frames the course encoding is switched to the fine encoding, and then considerable predictive error signals are generated, which result in more unnecessary information. Accordingly, the encoding efficiency deteriorates.
Another prior approach is disclosed in "A method for facial region detection on a color video phone", Report No. D-92, 1989 Spring Grand Conference, The Institute of Electronics, Information and Communication Engineers of Japan. According to this conventional approach, a facial region is detected using histograms of picture elements having values larger than a threshold value in a differential image between consecutive frames. At first a vertical histogram is generated by counting significant picture elements in the differential image horizontally, and is used for determination of the top of the face. And the image is divided into horizontal band sub-areas and then horizontal histograms are generated for the sub-areas by counting significant picture elements in the sub-areas vertically. The width of the face is determined based on the several horizontal histograms for the sub-areas beneath the top position of the face, and height of the face is determined proportional to the face width.
However, this approach doesn't directly detect the outline of the face, and doesn't extract correctly the facial region for fine encoding. Further, this article doesn't suggest encoding only the facial part to reduce unnecessary information caused by noise in the background part