Conventional moving picture encoding apparatuses generate a sequence of encoded information, i.e., a bit stream, by digitizing moving picture signals input from the outside and then performing encoding processing in conformity with a certain moving picture encoding scheme.
One of the moving picture encoding schemes is ISO/IEC 14496-10, Advanced Video Coding, which was recently approved as a standard (see Non-patent Document 1, ISO/IEC 14496-10 Advanced Video Coding). Moreover, one known reference model in developing an encoder according to the Advanced Video Coding is a JM (Joint Model) scheme.
In the JM scheme, each of image frames making up a moving picture is divided into blocks each having a size of 16×16 pixels, which block is generally referred to as MB (Macro Block), and each MB is divided into blocks each having a size of 4×4 pixels (which will be referred to as 4×4 block hereinbelow), the 4×4 block being used as a minimum elemental unit for encoding.
FIG. 1 shows an example of the block division for an image frame in QCIF (Quarter Common Intermediate Format). Although an ordinary image frame is composed of brightness signals and color-difference signals, the following description will address only brightness signals for simplifying the explanation.
Now the operation of the JM scheme in which an image frame is input and a bit stream is output will be described hereinbelow with reference to FIG. 1.
Referring to FIG. 2, the JM scheme (conventional) is comprised of an MB buffer 101, a converting apparatus 102, a quantizing apparatus 103, an inverse-quantizing/inverse-converting apparatus 104, an entropy encoding apparatus 105, a bit-rate control apparatus 106, a frame memory A 107, an in-loop filtering apparatus 108, a frame memory B 109, an intra-frame-predicting apparatus 110, an inter-frame predicting apparatus 111, a predicting scheme estimating apparatus 112, and a switch SW100.
The operation of the apparatuses will now be described.
The MB buffer 101 stores pixel values in an MB to be encoded of an input image frame.
From the pixel values (which will be simply referred to as input MB hereinbelow) of the MB to be encoded supplied by the MB buffer 101 are subtracted predicted values supplied by the intra-frame predicting apparatus 110 or inter-frame predicting apparatus 111. The input image from which the predicted values have been subtracted is called a predictive error.
In inter-frame prediction, a current block to be encoded is predicted making reference to an image frame reconstructed in the past whose display time is different from that of the current image frame to be encoded, and using a correlation between the image frames in a temporal direction. In the following description, encoding requiring the inter-frame prediction in decoding will be sometimes called inter-encoding, an inter-encoded MB will be called inter-MB, and a predicted value generated by the inter-frame prediction will be called inter-frame predicted value or inter-frame predicted image.
On the other hand, encoding that does not use the aforementioned inter-frame prediction in decoding will be called intra-encoding, and an intra-encoded MB will be called intra-MB. In the JM scheme, intra-frame prediction may be used in intra-encoding. In the intra-frame prediction, a current block to be encoded is predicted making reference to an image frame reconstructed in the past whose display time is the same as that of the current image frame to be encoded, and using a correlation between the image frame in a spatial direction. A predicted value generated by the intra-frame prediction will be called an intra-frame predicted value or intra-frame predicted image hereinbelow.
An encoded image frame exclusively composed of the aforementioned intra-MB's will be called I-frame. On the other hand, an encoded image frame composed of the inter-MB's, in addition to the aforementioned intra-MB's, will be called P-frame, and an encoded image frame containing inter-MB's that can be predicted in inter-frame prediction not only from one image frame but from two simultaneous frames will be called B frame.
In general, when a scene is still, a pixel correlation between adjacent image frames is very high, so more effective compression can be made in inter-encoding than in intra-encoding. Thus, most image frames in a moving picture are encoded with a P- or B-frame in which inter-encoding can be used.
The converting apparatus 102 frequency-converts the aforementioned predictive error from a spatial domain into a frequency domain on a block-by-block basis, wherein the block is smaller than an MB. The predictive error converted into the frequency domain is generally called transform coefficient. The frequency conversion that may be used is orthogonal transform such as DCT (Discrete Cosine Transform) or Hadamard transform, and the JM scheme (conventional) employs integer-precision DCT in which the basis is expressed in an integer, with a block size of 4×4 pixels.
On the other hand, the bit-rate control apparatus 106 monitors the number of bits of a bit stream output by the entropy encoding apparatus 105 for the purpose of encoding the input image frame in a target number of bits; if the number of bits of the output bit stream is greater than the target number of bits, the apparatus 106 outputs a quantizing parameter indicating a larger quantization step size; on the contrary, if the number of bits of the output bit stream is smaller than the target number of bits, it outputs a quantizing parameter indicating a smaller quantization step size. The output bit stream is thus encoded to have a number of bits closer to the target number of bits.
The quantizing apparatus 103 quantizes the aforementioned transform coefficients with a quantization step size corresponding to the quantizing parameter supplied by the bit-rate control apparatus 106. The quantized transform coefficient is sometimes referred to as level or quantized value (a quantized value to be intra-encoded will be referred to as quantized value in intra-encoding, and that to be inter-encoded will be referred to as quantized value in inter-encoding hereinbelow). The quantized values are entropy-encoded by the entropy encoding apparatus 105 and output as a sequence of bits, i.e., a bit stream.
An apparatus consisting of the converting apparatus 102 and quantizing apparatus 103 together will be sometimes called a converting/quantizing apparatus 200 hereinbelow.
Subsequently, the inverse-quantizing/inverse-converting apparatus 104 applies inverse quantization to the levels supplied by the quantizing apparatus 103 for subsequent encoding, and further applies thereto inverse frequency-convert to bring them back into the original spatial domain.
The inversely quantized transform coefficient will be called inversely quantized transform coefficient or reconstructed transform coefficient hereinbelow. The predictive error brought back into the original spatial domain will be called reconstructed predictive error hereinbelow.
The frame memory A 107 stores values representing the reconstructed predictive errors added with the predicted values as a reconstructed frame.
After all MB's in a current image frame to be encoded have been encoded, the in-loop filtering 108 applies noise-reducing filtering to the reconstructed frame stored in the frame memory A 107.
The frame memory B 109 stores the image frame after the noise-reducing filtering supplied by the in-loop filter 108 as a reference frame.
The intra-frame predicting apparatus 110 generates intra-frame predicted values from the reconstructed frame stored in the frame memory A 107 based on an MB type and an intra-frame predicting direction supplied by the predicting scheme estimating apparatus 112.
The inter-frame predicting apparatus 111 generates inter-frame predicted values from the reference frame stored in the frame memory B 109 based on an MB type and a motion vector supplied by the predicting scheme estimating apparatus 112.
The predicting scheme estimating apparatus 112 estimates a set of an intra-frame predicting direction and an intra-MB type and a set of a motion vector and an inter-MB type in inter-frame prediction that give a minimum predictive error with respect to the input MB.
The switch SW100 selects as a predicted value an output from the intra-frame predicting apparatus 110 if intra-frame prediction minimizes a predictive error, or otherwise, it selects an output from the inter-frame predicting apparatus 111 based on a result of prediction at the predicting scheme estimating apparatus 112.
The JM scheme thus encodes a moving picture by executing such processing.
When encoding a moving picture for the purpose of broadcasting or storage, the moving picture encoding using the inter-encoding cyclically inserts an I-frame to allow partial replay (decoding) starting from some midpoint. A simple example for this is shown in FIG. 3.
However, as a side effect of the cyclic I-frame insertion, visually conspicuous flickering in a cycle of I-frame insertion (referred to as I-frame flickering hereinbelow) occurs. As the bit rate in encoding a moving picture is lowered, the I-frame flickering deteriorates subjective image quality.
The reason why the I-frame flickering is visually conspicuous is that the noise pattern in intra-encoding of an I-frame is different from that in inter-encoding of a P-frame displayed immediately before. This difference in the noise pattern in encoding is caused by a difference in prediction structure between the I- and P-frames (FIG. 4).
A means for reducing such flickering that can be contemplated is adaptive quantization in which the quantization step size is fined in a region with a high the human visual characteristic, as disclosed in Patent Document 1.
Another means for reducing such flickering that can be contemplated is a method involving using a near-optimal ratio of the quantization step sizes for I-, P- and B-frames to keep constant image quality, as disclosed in Patent Document 2.
Still another means for reducing such flickering that can be contemplated is a method of reducing flickering between consecutive P-frames by using a level of the predictive error of zero whenever the residual signal of a P-frame is smaller than the quantization dead-zone width, as disclosed in Patent Document 3.
Still another means for reducing such flickering that can be contemplated is a method involving encoding all consecutive frames as I-frame and keeping a uniform level in a region decided to be still, as disclosed in Non-patent Document 2.
Still another means for reducing such flickering that can be contemplated is a method involving encoding all consecutive frames as I-frame and substituting the level in a region decided to be still with that of a previously encoded image frame, as disclosed in Patent Document 4.
Still another means for reducing such flickering that can be contemplated is a method involving, in a region decided to be still in encoding an I-frame, estimating an intra-frame predicting direction taking account of similarity to an I-frame encoded immediately before to prevent fluctuation in the intra-frame predicted value, as disclosed in Non-patent Document 3.
Patent Document 1: Japanese Patent Application Laid Open No. 2002-335527
Patent Document 2: Japanese Patent Application Laid Open No. 1993-111012
Patent Document 3: Japanese Patent Application Laid Open No. 1996-251593
Patent Document 4: Japanese Patent Application Laid Open No. 2003-235042
Non-patent Document 1: ISO/IEC 14496-10 Advanced Video Coding
Non-patent Document 2: Iguchi, et al., “A Method of Reducing Intra-mode Flickering in H.264 Encoding,” FIT 2003, J-040, 2003
Non-patent Document 3: Sakaida, et al, “Intra-frame Flicker Suppression in AVC/H.264 Encoding Using Adaptive Quantization” FIT 2004, LJ-009, 2004