The present invention relates to an image encoding method for dividing an input image into two-dimensional blocks and orthogonally transforming, quantizing and encoding each of the blocks. More particularly, it relates to an image encoding and area extracting device which is capable of extracting a specified area and a motion area from an input image and performing controlled quantizing and encoding each of the extracted areas.
Recently, with an increasing demand for image communication services such as videophones and videoconferencing for effectively using ISDN (Integrated Services Digital Networks) and PSTN (Public switched telecommunication networks), a number of studies have been made for developing methods for more effectively encoding image information to be more effectively transmitted. These studies are directed to saving an amount of information included in an image by removing redundancy therefrom, using statistic characteristics. A "hybrid" encoding method is well known, which uses a combination of motion compensative prediction with discrete cosine transformation. However, image data encoded by the hybrid encoding method could be reproduced with noise elements in the case of transmission of the data at a low bit-rate This problem had to be solved.
Accordingly, such a way to improve an image quality was studied wherein each of specified areas is extracted from an input image and is then quantized at a controlled quantizer stepsize (quantizing intervals). For example, there is an idea that a face area is extracted from an input image and other areas (hereinafter referred as to background area) of the image are quantized at a quantizer stepsize larger than that of the face area, i.e., with a smaller amount of codes, making it possible to assign a large amount of the codes to an encoding of the face area and to thereby improve subjective quality of the image (R. H. J. N. Plompen, et al.: "An Image knowledge based video codec for low bitrates," SPIE Vol. 804 Advanced in image processing, 1987).
An exemplified conventional image encoding by using a motion compensative prediction method together with a two-dimensional orthogonal transform technique will be described as follows:
Image sequences taken by a television camera are digitized and inputted into the frame memory wherein each input image (frame) is divided into blocks each consisting of N.times.M pixels (N and M are natural numbers) and stored. A subtracter determines a difference between the input image stored in the frame memory and a motion-compensated prediction value from the motion compensative predicting portion by performing calculation per block of the image. The orthogonal transforming portion performs a two-dimensional orthogonal transform of each pixel of each block and transmits obtained coefficients of the transformation to the quantizing portion which in turn quantizes the received coefficients at a quantizer stepsize outputted by the encoding control portion. The encoding portion conducts entropy encoding of the quantized output of the quantizing portion and generates coded information.
The buffer memory stores therein the coded information in the form suited to transmission rate of a communication line.
The output signal from the quantizing portion is also given to the inverse quantizing portion wherein the output is inversely quantized to produce a coefficient. The inverse orthogonal transforming portion performs inverse two-dimensional orthogonal transform of the transform coefficients and the adder sums the value of the motion compensative prediction from the motion compensative predicting portion and the image and the summed image is stored in the frame memory. The reconstructed image stored in the frame memory and the current image stored in the frame memory are inputted into the motion detecting portion which in turn detects motion vectors.
The motion compensative predicting portion determines a motion-compensated prediction value from the motion vectors and the reconstructed image stored in the frame memory and discriminates which mode was applied for prediction, interframe or intraframe, by comparing power of the input image with power of a difference of the motion-compensated prediction value and the input image. The encoding control portion receives effective/ineffective information representative of a face area and a background area from the area extracting portion and information on occupation of the buffer memory with coded information, and determines respective quantizer stepsizes (intervals) for quantizing the face area and the background area. For instance, when the area extracting portion judges the face area to be effective and the background area to be ineffective, it determines a reference quantizer stepsize on the basis of the occupied size of the buffer memory and selects a smaller quantizer stepsize for face area than that for the background area.
The above-mentioned method has only two kinds of quantizer stepsizes for a face area and a background area and only discloses that a quantizer stepsize for the face area is smaller than a quantizer stepsize for the background area. Accordingly, when the method is applied to the practical image sequence encoding device, values dQf and dQb, which may vary depending upon a result of extraction by the area extracting portion, are defined against a quantizer stepsize Q determined according to occupation of the buffer memory with coded information, and, for example, values Q-dQf and Q+dQb may be applied for quantizing the face area and the background area respectively. This method, however, may quantize noise signals existing in the background area resulting in producing extra-coded information and correspondingly reducing the amount of codes assigned to the face area.
The face area extraction is conducted in such a manner that a figure (man or woman) is first extracted from an input image by using some kinds of human-figure extracting templates and then a face area is taken out therefrom. This method can not realize flexible extraction of a face area in an image because the using of templates that limits the size and position of a human figure to be extracted from the image. Furthermore, no correlation is provided between the extracted face area of the current image and that of a preceding image. This may result in reproducing the image of a decreased quality with discontinuity of the face area portion thereof.
For an image having a large motion area, relatively large amount of codes shall be assigned to the motion area and an allocation of codes to a face area is correspondingly reduced, that may not improve the quality of face area image which has been properly extracted.
Furthermore, improvement of the quality of a face image by changing the quantizer stepsize may not be expected when the face area occupies a relatively large or small portion of the screen image.
If an image wherein scene change occurs is encoded with a reduced amount of codes assigned to its background area, it may be reproduced with unstable quality of background being unpleasant to eyes of an observer.
It is also unreasonable that pixels of eyes, a nose and a mouth in a face area, which may sharply vary in brightness, and pixels of a skin portion thereof are quantized at the same quantizer stepsize.
In addition, there still remains such a problem that a face area of a certain increased size may have a decreased encoding efficiency.