1. Field of the Invention
The present invention generally relates to high efficiency compression encoding and decoding methods for video data of moving images, and particularly to a methods and an apparatus for automatic switching of spatial resolution of video signals to be encoded in accordance with properties of images during encoding of the moving images.
2. Description of the Related Art
Image compression is performed to transmit a maximum quality of images within a limited bandwidth in moving image transmissions. The image compression for a given spatial resolution, such compression methods as MPEG-4 and H.263 are employed, for example. In such compression methods for the moving images, there are requirements to switch the spatial resolution in encoding in accordance with complexity of the images in a same scene and in a same bit stream.
FIG. 1 shows a conventional switching apparatus for switching the spatial resolution. In FIG. 1, a switching apparatus 100 includes an encoding unit 101, a single frame averaged quantization size calculation unit 102, an originated data bit counter unit 103, a resolution decision unit 104 and a memory unit 105. A moving image signal 110 with a certain resolution is input to the encoding unit 101. At first, a switching of the resolution takes place in accordance with a resolution 111 that is determined by the resolution decision unit 104. Then, the encoding unit 101 encodes the moving images into a bit stream 112 by a given compression method, and also the encoding unit 101 outputs a quantization size 113 for each of blocks, which is input to the single frame averaged quantization size calculation unit 102. The resolution decision unit 104 determines a resolution 111 by threshold values QP1, QP2, FR1 and FR2 as will be described later, based on an originated data bit quantity 114 (the number of information bits) from the originated bit counter unit 103, a quantization size 115 output by the single frame averaged quantization size calculation unit 102, and a previous resolution 116 output by the memory 105.
An example of conventional technology for the resolution decision unit 104 may be referred to ITU-T Document Q15-C-15 xe2x80x9cVideo Codec Test Model, Near-Term, Version 9xe2x80x9d issued in December 1997. FIG. 2 shows a process flow of the resolution decision method disclosed in this document. This method is based on a principle that a product of an average quantization size (QPpre in FIG. 2) expresses a complexity, i.e., a degree of difficulty in encoding, and an originated data size (B in FIG. 2).
FIG. 2 represents operations for one frame. In a step 201, necessary parameters are provided. Th1 and Th2 represent threshold values. QPpre represents a single frame averaged quantization size of a frame encoded most recently. B represents an originated data volume of the frame that was encoded most recently. QP1, QP2, FR1 and FR2 are parameters to determine the threshold values. TB represents a targeted bit rate. Here, the threshold value Th1 is a threshold value with respect to image complexity in a high spatial resolution. If a product of the single frame averaged quantization size QPpre for the most recent encoded frame, as described in above, and the originated data volume B of the most recent encoded frame is larger than the threshold value Th1, then it is determined that the image is exceedingly complex, and the resolution for the image to be encoded is chosen to be low. Conversely, the threshold value Th2 is a threshold value with respect to the image complexity in a low spatial resolution. If the product of the single frame averaged quantization size QPpre for the most recent encoded frame, as described in above, and the originated data volume B is smaller than the threshold value Th2, then it is determined that the image is not complex, and the resolution for the image to be encoded is chosen to be high. FR1 is a frame rate corresponding to the high resolution, and FR2 is a frame rate corresponding to the low resolution. FR1 and FR2 may be equal.
In a step 202, the threshold values Th1 and Th2 are determined. The threshold value Th1 is calculated by multiplying the parameter QP1 to the target bit rate per frame TB/FR1 for the high resolution. Similarly, the threshold value Th2 is calculated by multiplying the parameter QP2 to the target bit rate per frame TB/FR2 for the low resolution.
In a step 203, a present spatial resolution, whether high or low, is checked. If the present spatial resolution is high, then the process proceeds to a step 204, otherwise it proceeds to a step 205.
In a step 204, a decision is made whether or not the resolution for the next frame should be lowered, given that the present spatial resolution is high. Actually, the threshold value Th1 is compared with the product of the single frame averaged quantization size QPpre for the most recently encoded frame and the originated data volume B for the most recently encoded frame, and if the product of QPpre and B is larger than Th1, then the spatial resolution for the next frame is lowered in a step 206.
In a step 205, a decision is made whether or not the resolution for the next frame should be heightened, given that the present spatial resolution is low. Actually, the threshold value Th2 is compared with the product of the single frame averaged quantization size QPpre for the most recently encoded frame and the originated data volume B for the most recently encoded frame, and if the product of QPpre and B is lower than Th2, then the spatial resolution for the next frame is heightened in a step 207.
In conventional technologies as described above, there is a problem. If a video content is of relatively still with modest movement, that is, if there is no discontinuity, such as scene changes, or abrupt movements in the contents, the product of the single frame averaged quantization size QPpre of the most recent encoded frame and the originated data volume B for the most recently encoded frame may be used as a standard to express the complexity of images. However, where the contents include images with such discontinuity as scene changes and abrupt movements, the product cannot be used as the standard for the image complexity. However, the product has been used as the standard to express the image complexity in conventional technologies.
For this reason, as described above, parameters have had to be manually adjusted in encoding when a video program contains images with abrupt motions, while the motions may not be highly visible to an audience, which has made a realtime encoding impossible and required a certain expertise to set the parameters.
It is a general object of the present invention to provide a method and an apparatus that substantially obviates one or more of the problems caused by the limitations and disadvantages of the related art.
Another object of the present invention to provide a method and an apparatus that automatically switches the spatial resolution of an image to be encoded by properties of the image, even during the encoding process.
The above objects of the present invention are achieved by an automatic setting method of a spatial resolution for a moving image, comprising the steps of: dividing the moving image into blocks and compression encoding the moving image for each of the blocks; decoding an encoded moving image thus obtained; obtaining a block distortion ratio from a decoded image; and making a resolution decision to select a first resolution lower than a current spatial resolution if the block distortion ratio is greater than a first threshold value or a second resolution higher than the current spatial resolution if the block distortion ratio is smaller than a second threshold value. In this manner, block distortions generated by a larger quantization size that has been introduced to compress the data volume to the required bit rate are suppressed.
The above-mentioned objects of the present invention are also achieved by an apparatus for automatically setting a spatial resolution for a moving image, comprising: a first unit dividing the moving image into blocks and compression encoding the moving image for each of the blocks; a second unit decoding an encoded moving image thus obtained; a third unit obtaining a block distortion ratio from a decoded image; and a fourth unit making a resolution decision to select a first resolution lower than a current spatial resolution if the block distortion ratio is greater than a first threshold value or a second resolution higher than the current spatial resolution if the block distortion ratio is smaller than a second threshold value.