The video compression technologies are used for efficiently transmitting and storing video data. MPEG1-4 and H.261 to H.264 are widely used video data compression standards.
In these video compression standards, a picture to be encoded is divided into a plurality of blocks, which are encoded and then decoded. In order to increase the coding efficiency, below described prediction coding is used. In intra-frame prediction, a predictive signal is generated using a signal of a reconstructed neighboring picture (a signal restored from a previously compressed picture data) which is present in the frame including the target block. By subtracting the predictive signal from a signal of the target block, a difference between them is obtained and encoded. In inter-frame prediction, a reconstructed picture signal, which is present in a frame different from the frame including a target block, is searched for a displacement of the signal. A predictive signal is generated to compensate the displacement. By subtracting the predictive signal from the signal of the target block, a difference between them is obtained and encoded. The reconstructed picture which is the subject of the motion search and compensation is referred to as a reference picture.
In bidirectional inter-frame prediction, not only is a past picture referenced, but also a future picture is referenced which is ordered for display after the target picture (the future pictures need to be encoded and reconstructed prior to encoding of the target picture). Then, predictive signals acquired from the past picture and the future picture are averaged. This prediction method is effective to predict an object not present in the past but thrown in a future frame and to reduce noise included in the two predictive signals.
Furthermore, in the inter-frame prediction defined in H.264, a plurality of reference pictures which have been encoded and then reconstructed are subjected to motion search, and the predictive signal with the smallest error is selected as an optimum predictive signal for the target block. A difference is calculated between the pixel signal of the target block and the optimum predictive signal and then subjected to a discrete cosine transform, quantization, and entropy encoding. At the same time, information is encoded which identifies the selected reference picture and the region in the selected reference picture from which the optimum predictive signal for the target block is acquired (referred to as a “reference index” and a “motion vector,” respectively).
In H.264, a plurality of reconstructed pictures may be referenced. These reconstructed pictures are stored, as reference pictures to be used in prediction, in a decoded picture buffer (DPB), which is a picture buffer memory. The size of the decoded picture buffer (DPB) is defined by a profile and a level, and defined as a bit count, instead of the number of reference pictures. Even with the same profile and level, the number of the storable reference pictures varies according to the frame size of pictures. For example, in the case where the profile is main (Main) and the level is 3.2, the maximum size of the picture buffer (MaxDPBSize: Maximum Decoded Picture Buffer Size) for storage of reference pictures used for prediction is defined as 7680.0×1024 [bytes]. Therefore, the number of reconstructed pictures storable in the decoded picture buffer (DPB) is 5 if the pictures are of 1280×720 and 4:2:0, and the maximum number of reconstructed pictures storable in the decoded picture buffer (DPB) is 4 if the pictures are of 1280×1024 and 4:2:0. FIGS. 1(a), 1(b) and 1(c) show pictures arranged in the decoded picture buffer in which the frame sizes of the pictures determine the maximum number of storable reconstructed pictures, which is 4 (FIG. 1(a)), 5 (FIG. 1(b)), or 6 (FIG. 1(c)). Memory pointers are provided adaptively to the frame size of the reconstructed pictures in the picture buffer memory prepared in advance, whereby an adaptive memory arrangement is achieved in the picture buffer memory.