Recently, with the arrival of the age of multimedia which integrally handles audio, video and pixel values, existing information media, for example, newspaper, journal, television, radio and telephone, and other means through which information is conveyed to people, has come under the scope of multimedia. In general, multimedia refers to a representation in which not only characters but also graphic symbols, audio and especially pictures and the like are related to each other. However, in order to include the aforementioned existing information media in the scope of multimedia, it appears as a prerequisite to represent such information in digital form.
However, when estimating the amount of information contained in each of the aforementioned information media in digital form, the information amount per character requires 1 to 2 bytes whereas audio requires more than 64 Kbits per second (telephone quality), and a moving picture requires more than 100 Mbits per second (present television reception quality). Therefore, it is not realistic to handle the vast amount of information directly in digital form. For example, a videophone has already been put into practical use via Integrated Services Digital Network (ISDN) with a transmission rate of 64 Kbits/sec to 1.5 Mbits/sec, however, it is impossible to transmit a picture captured by a TV camera.
This therefore requires information compression coding techniques. For instance, in the case of a videophone, compression coding standards called H.261 and H.263 standards recommended by International Telecommunication Union-Telecommunication Standardization Sector (ITU-T) are employed. According to the compression coding standard of the MPEG-1 standard, picture information as well as audio information can be stored in an ordinary music Compact Disc (CD).
Here, Moving Picture Experts Group (MPEG) is an international standard regarding compression methods for compression of moving picture signals. The MPEG-1 standard is a standard that compresses video signals down to 1.5 Mbit/s, namely, to compress the information included in TV signals approximately down to a hundredth. The quality targeted by the MPEG-1 standard was medium quality so as to realize a transmission rate primarily of about 1.5 Mbits/sec, therefore, MPEG-2, standardized with the view to meeting the requirements of even higher quality picture, realizes a TV broadcast quality for transmitting moving picture signals at a transmission rate of 2 to 15 Mbits/sec.
After that, a working group (ISO/IEC JTC1/SC29/WG11) previously in charge of the standardization of the MPEG-1 and then the MPEG-2 has further regulated a MPEG-4 standard which achieves a compression rate superior to the one achieved by the MPEG-1 and the MPEG-2, allows encoding, decoding, and operating on a per-object basis and realizes a new function required by the age of multi media. At first, in the process of the standardization of the MPEG-4 standard, the aim was to standardize a low bit rate coding, however, the aim is presently extended to a more versatile coding with a high bit rate coding for interlaced pictures and others. Moreover, at the presence, the ISO/IEC and the ITU-T have jointly developed, as a next-generation image encoding method, a MPEG-4 Advanced Video Coding (AVC) standard is regulated.
In general, in encoding of a moving picture, the amount of information is compressed by reducing redundancy in temporal and spatial directions. Therefore, an inter-picture prediction coding, which aims at reducing the temporal redundancy, estimates a motion and generates a predictive picture on a block-by-block basis with reference to prior and subsequent pictures, and then encodes a differential value between the obtained predictive picture and a current picture to be encoded. Here, a “picture” is a term to represent a single screen and it represents a frame when used for a progressive picture whereas it represents a frame or fields when used for an interlaced picture. The interlaced picture here is a picture in which a single frame consists of two fields respectively having different time. For encoding and decoding an interlaced picture, three ways are possible: processing a single frame either as a frame, as two fields or as a frame/field structure depending on a block in the frame.
A picture to which an intra-picture prediction coding is performed without reference pictures (in other words, without referring to other pictures) is called an “I-picture”. Further, a picture to which the inter-picture prediction coding is performed with reference to only a single prior or subsequent picture is called a “P-picture”. A picture to which the inter-picture prediction coding is performed by referring simultaneously to two pictures is called a “B-picture”. The B-picture needs, as reference pictures, two pictures displayed either before or after a current picture to be encoded. Here, prediction referring to a picture whose display time is before that of the current picture is called forward-directional prediction, prediction referring to a picture whose display time is after that of the current picture is called backward-directional prediction, and prediction referring to two pictures selected from the pictures whose display time is either before or after that of the current picture, as an arbitrary combination is called bi-directional prediction. The reference image (reference picture) can be designated for each block that is a basic unit for encoding and decoding. However, the reference pictures need to be already encoded and decoded as a condition to encode and decode these I-picture, P-picture, and B-picture.
A motion compensation inter-picture prediction coding is used for encoding the P-picture or the B-picture. Here, the motion compensation inter-picture prediction coding is an encoding method which adopts motion compensation to an inter-picture prediction coding. Moreover, the motion compensation is a method of reducing the amount of data while increasing prediction precision by estimating an amount of motion (hereinafter, referred to as a “motion vector”) of each part in a picture not by simply predicting a picture from a pixel value of a reference frame, and by performing prediction in consideration of the estimated amount of data. For example, the amount data is reduced by estimating a motion vector of a current picture to be encoded and encoding a predictive differential between a predicted value which is shifted as much as the estimated motion vector and the current picture to be encoded. Since this method requires information about the motion vector at the time of decoding, the motion vector is also encoded and recorded or transmitted.
However, if the bi-directional prediction is used in inter-picture prediction coding with motion prediction for B-pictures, motion vectors of two directions are required, which results in increase of a coding amount of the motion vectors. Therefore, if the increased coding amount accounts for a significant part of the total coding amount, it is impossible to encode the predictive differential adequately, so that image quality of the encoded picture is lowered.
Thus, as a method for inhibiting the conventional coding amount increase due to motion vectors, if an available coding amount is not enough for a data amount of an input picture, the bi-directional prediction is prohibited from being used. As a result, it is possible to limit generation of a coding amount of motion vectors. This method is proposed in patent reference 1, for example.
FIG. 1 is a block diagram showing one example of a functional structure of the above-described conventional picture encoding device 200. As shown in FIG. 1, the picture encoding device 200 includes a block dividing unit 1, a subtractor 2, a Discrete Cosine Transform (DST) unit 3, a quantization unit 4, a variable length coding unit 5, a buffer 6, a coding amount control unit 7, an inverse quantization unit 8, an Inverse Discrete Cosine Transform (IDCT) unit 9, an adder 10, a memory 11, and a motion estimation unit 12.
Regarding input picture data 101, the block dividing unit 1 divides, per predetermined unit of coding amount, the input picture data 101 into macroblocks each of which has the predetermined number of pixels. The motion estimation unit 12 obtains reference image corresponding to each macroblock from the memory 11, detects a motion vector from the reference image, and outputs a predictive image generated using a motion vector and the above motion vector. The motion vector outputted from the motion estimation unit 12 is not processed by the subtractor 2, the DCT unit 3, nor the quantization unit 4, but is provided to the variable length coding unit 5 to be applied with variable length coding. The subtractor 2 outputs a predictive differential signal representing a differential between the output from the block dividing unit 1 and the predictive image. The DCT unit 3 outputs a signal obtained by applying DCT to the output from the subtractor 2. The quantization unit 4 quantizes the signal outputted from the DCT unit 3, using a quantization parameter outputted from the coding amount control unit 7. Here, the quantization parameter is a parameter representing a quantization step to be used in the quantization unit 4 and the inverse quantization unit 8. Furthermore, the variable length coding unit 5 applies variable length coding to the output from the quantization unit 4, and provides the resulting signal to the buffer 6.
On the other hand, in order to generate reference image, the inverse quantization unit 8 outputs a signal obtained by inversely quantizing the output from the quantization unit 4 using the quantization parameter. The IDCT unit 9 applies IDCT to the inversely-quantized signal, and provides the resulting signal to the adder 10. The adder 10 adds the signal applied with the IDCT with the predictive image to generate reference image, and stores the reference image into the memory 11.
The coding amount control unit 7 calculates a quantization parameter, according to a target coding amount (bit rate) after compression coding, and the total coding amount which the variable length coding unit 5 has been encoded, and then provides the quantization parameter to the quantization unit 4. Thereby the coding amount control unit 7 can control the coding amount. Furthermore, the coding amount control unit 7 outputs a bi-directional prediction control signal to the motion estimation unit 12, when the input picture data (data amount) is too large for the target coding amount. Thereby, it is possible to control the motion estimation unit 12 to prevent from selecting predictive image or a motion vector for the bi-directional prediction (in other words, to control the motion estimation unit 12 to select a predictive image or a motion vector using forward or backward prediction).
According to the above-described conventional device, when the target coding amount is small (in other words, the quantization parameter is great) and the coding amount allocated for frames to be encoded is small, the bi-directional prediction is prohibited to be selected in order to prevent increase of the coding amount of the motion vectors, so that it is possible to reduce lowering of image quality of the encoded picture due to the coding amount increase as described above.    [Patent Document 1] Japanese Patent Application Laid-Open No. 2000-13802