In technology for digitally processing moving images, methods for compressing and encoding a large amount of information include a standard ISO/IEC 13818, also known as MPEG-2, as an encoding method for digital video and accompanying audio. Part 2 of MPEG-2 (ISO/IEC 13818-2) specifies video coding. Bitstreams generated in compliance with the standard of MPEG-2 as above are widely used in communication, television broadcasting and the like (See Non-Patent Document 1).
A bitstream compliant with MPEG-2 has, from the highest sequence layer, a hierarchical structure and includes a GOP (Group of Pictures) layer, a picture layer, a slice layer, a macroblock layer, and a block layer.
In MPEG-2, each image frame is stored in a frame memory when processing moving images constituted by a series of a plurality of images, the temporal redundancy is exploited by taking a motion compensated difference between frames, and moreover, discrete cosine transform (hereinafter abbreviated as “DCT”) is applied to the plurality of pixels constituting each frame so as to exploit the spatial redundancy and to realize efficient compression and encoding of moving images.
An encoded signal is transmitted to a decoder so as to be decoded and played back. MPEG-2 generally realizes its high compression ratio using inter-frame coding. In the decoder, the image frame is decoded and stored in a first frame memory, an image frame to succeed is predicted on the basis of motion information and stored in a second frame memory, and an image frame to be inserted between them is further predicted from the two frames so as to constitute a series of image frames and reconstruct moving images. The above method is called bidirectional prediction.
In MPEG-2, in order to realize this bidirectional prediction, three types of frames, namely, an I-frame, a P-frame, and a B-frame, are defined. An I-frame indicates an intra-coded frame and is an image encoded as a still image independently of the other frames. A P-frame indicates a forward predictive coded frame and is an image predicted and encoded on the basis of an I-frame or a P-frame located in the past. A B-frame indicates a bidirectionally predictive coded frame and an image predicted and encoded on the basis of forward and backward, or bi-directional, frames, each of which is an I-frame or a P-frame located in the past or the future. That is, after the I-frames and the P-frames are first encoded, then the B-frames to be inserted between them are encoded (See Non-Patent Document 2).
Since the inter-frame coding has a characteristic that compression is performed using correlation between continuous frames, an image quality can easily deteriorate if compression and decompression are repeated. Therefore, in video editing systems or video servers requiring high image quality, only intra-frame coding is used in some cases. For example, there is a case in which only I (intra) frames are used in MPEG-2. In MPEG-2, rate control is generally executed by the unit of GOP including a plurality of frames such as P-frames and B-frames in addition to I-frames; a rate control method similar to the above can be applied to the encoding of only I-frames, hereinafter referred to as “I-frame-only encoding”.
An MPEG-2 bitstream encoded by an encoder is sent out to a transmission path operable at a predetermined transfer rate, input to a decoder on the transmission path, and then decoded to be later played back. However, the amount of information, i.e. the entropy, of moving picture signals is not constant. In order to encode the moving picture signals and send out the encoded moving picture signals in the form of bitstream to the transmission path at a fixed rate, it is necessary to carry out rate control so that the encoded signals will have constant entropy.
A prior-art rate control of image encoding will be described referring to FIG. 8. A prior-art image encoding apparatus 100 shown in FIG. 8 is provided with an input portion 101 to which digitized image data is input, a blocking processing portion 102, an encoding portion 103, a FIFO buffer 104, a rate control portion 105, and an output portion 106. The blocking processing portion 102 partitions image data input from the input portion 101 into blocks and generates macroblocks which are base units of encoding. The encoding portion 103 applies DCT transform, quantization, and variable-length encoding on the macroblocks generated by the blocking processing portion 102 and generates a bitstream. The FIFO buffer 104 temporarily accumulates the bitstream supplied from the encoding portion 103 and outputs it through the output portion 106. The rate control portion 105 carries out a feedback rate control for controlling quantization parameters used by the encoding portion 103 for quantization processing of the subsequent image data on the basis of the occupancy of the FIFO buffer 104. The quantization parameters influence the coarseness or fineness of the quantization, and the coarseness or fineness of the quantization directly influences the number of bits encoded per second. If the quantization becomes coarse, less data is maintained, and a quality of the encoded image data is deteriorated. If the quantization becomes fine, more data is maintained and the quality of image encoding is improved. However, if the quantization is too fine, the number of encoded bits per second exceeds an allocated band width, and rate control needs to be applied again so as to fit in a limited band width.
Since a high bit rate is required for I-frame-only encoding, the decoder buffer does not have enough room in many cases. Therefore, since a stricter rate control is required for I-frame-only encoding, rate control takes more time than usual. In addition, since the rate of I-frames may differ frame by frame due to the nature of the feed-back rate control, particularly as a result of editing such as scene change, cut, insert and the like, the rate increases instantaneously, which might result in violation of the constraint of buffer management and thus a buffer failure.
A DV codec does not have the above problem since the rate control is executed so that the I-frames become the same size. In order to efficiently carry out the rate control, a method called macroblock shuffling is used (for example, see Patent Document 1). The macroblock is an assembly of DCT blocks representing luminance and chrominance. Since MPEG-1 and MPEG-2 also have ones corresponding to the macroblock, the “macroblock” herein is assumed to refer to any of those corresponding to the macroblock of MPEG-1 and MPEG-2.
In the macroblock shuffling, a plurality of, for example, five pieces of the macroblocks located at spatially random positions are collected and then encoded. Here, a code length of an encoded macroblock is referred to as a “rate of a macroblock”. A rate control is performed so that a rate of the macroblocks thus collected and encoded does not exceed a target rate. The assembly of macroblocks collected as above is referred to as a “segment”. By the above mentioned rate control, bits are fairly allocated to the whole image data while the rate control is performed on a small unit of the segment. In addition, since the number of macroblocks included in a segment can be made small, the rate control can be executed at a high speed, and a required circuit scale can be made small.