Moving pictures are used in an increasing number of applications in fields ranging from video telephony and video conferencing to DVD and digital television. In order to transmit moving pictures, it is necessary to transmit an enormous volume of data via an existing transmission channel in which an effective frequency bandwidth is limited. When digital data is transmitted within a limited transmission band, it becomes an absolute necessity to compress or reduce the volume of data to be transmitted.
For the purpose of enabling inter-operability between video data in a plurality of systems dedicated to applications which were designed by different manufacturers, video coding standards have been developed for compressing the volume of video data in a common method. Such video coding standards include H.261 and H.263 developed by ITU, and MPEG-1, MPEG-2, and MPEG-4 developed by ISO/IEC.
A basic approach to coding taken by many of the above standards is comprised of the following major stages:
1. dividing each picture into blocks made up of pixels so as to enable processing to be performed at a block level on pictures that constitute video. A picture refers to a frame or fields;
2. reducing spatial redundancy in a picture by performing transformation, quantization, and entropy coding on video data in one block; and
3. Coding a difference between consecutive frames, utilizing correlation between the consecutive pictures.
The above is achieved by the use of a motion estimation and compensation technique. In order to estimate, on a block-by-block basis, a motion vector indicating predictive image data which indicates a strong correlation between frames, an encoder performs motion estimation to search the coded frames for a position of image data indicating a strong correlation. Furthermore, the encoder and the decoder perform motion compensation to extract predictive image data with respect to the motion vector.
FIG. 1 shows an example configuration of a video encoder (moving picture coding apparatus). The video encoder illustrated in the diagram is comprised of: a transform unit 13 operable to transform spatial video data into a frequency domain; a quantization unit 14 operable to quantize the transform coefficients obtained by the transform unit 13; a variable length coding unit 15 operable to perform entropy coding on the quantized transform coefficients; a video buffer 17 for supplying a transmission channel with compressed video data at a variable bit rate depending on a transmission rate; a decoder 16, and a motion estimation unit 19.
Video data 10 from the encoder shown in FIG. 1 is inputted in a form of pixel values using pulse-code modulation (PCM). A subtractor 11 calculates a differential value between the video data 10 and a motion-compensated image 12. The motion-compensated image 12 is obtained as a result of decoding an already coded image and performs motion compensation on the resultant (“a current decoded picture”). This is carried out by a decoder 16 in pairs with the video encoder. The decoder 16 performs the coding procedure in an inverse order. Stated another way, the decoder 16 is comprised of an inverse quantization unit (Q-1), an inverse discrete cosine transform unit (IDCT), and an adder for adding a decoded difference and a motion-compensated image so as to generate a preceding picture which is equivalent to the one obtained at the decoder's side.
In motion-compensated coding, motion-compensated data in a current picture is generated, based on motion estimation which has been performed on such picture and a decoded picture, from picture data derived from the corresponding decoded picture. A motion predicted value is represented by a two-dimensional motion vector indicating a pixel displacement between the decoded picture and the current picture. Usually, motion estimation is performed on a block-by-block basis. Stated another way, a block in the decoded picture which is most strongly correlated with a block in the current frame is regarded as a motion-compensated image. The motion estimation unit 19 operable to perform such motion estimation and a motion compensation unit MC operable to generate a motion-compensated image from the picture which has been decoded corresponding to the motion vector are incorporated into the encoder.
The video encoder illustrated in FIG. 1 operates in the following manner. The video image of the video signal 10 is divided into a group of a certain number of small blocks generally called macro block. For example, a video image 20 shown in FIG. 2 is divided into a plurality of macro blocks 21. Generally, each of the macro blocks has a size of 16×16 pixels.
Furthermore, a picture is divided into a certain number of slices 22. Each slice is made up of a plurality of macro blocks and serves as a unit of alignment recovery at the time of data loss. Note that an arrangement of macro blocks that constitute a slice is not necessarily made up of macro blocks in the same row as shown in FIG. 2, and therefore that it is also possible that a slice includes macro blocks in a plurality of rows and there is a delimiter of another slice in the middle of the row.
When image data in video is coded by just reducing the volume of spatial redundancy in the image, the resultant picture is called I picture. An I picture is coded with reference to only pixel values in the picture. The data size of a coded I picture is large because temporal information used to reduce the volume of data cannot be used for an I picture.
With the aim of performing an efficient compression utilizing temporal redundancy between consecutive pictures, prediction coding is performed on the consecutive pictures on the basis of motion estimation and motion compensation. When a reference picture selected in motion estimation is one picture which has been already coded and decoded, it is called P picture. Meanwhile, when two pictures are reference pictures (usually, forward and backward pictures in display order with respect to a current picture), they are called B picture.
According to the H.26L standard on a picture coding method under development, motion compensation for each of 16×16 macro blocks can be carried out by using a different block size. Each motion vector can be determined with respect to a block with a size of 4×4, 4×8, 8×4, 8×8, 8×16, or 16×16 pixels. The effect of using a smaller block size for motion compensation is that it becomes possible to describe detailed motions.
Based on the result of motion estimation, estimation is performed on a determined motion vector as motion compensation. Subsequently, information included in a prediction error block obtained from the predicted block is transformed into transform coefficients in the transformation unit 13. Generally, two-dimensional Discrete Cosine Transform (DCT) is employed. Such obtained transform coefficients are quantized, and entropy coding (VLC) is performed on the resultant by the entropy coding unit 15 in the end. Note that a motion vector calculated by the motion estimation unit 19 is used for motion compensation and is incorporated into compression video data 18 via the variable length coding unit 15 and the video buffer 17.
A transmission stream of the compressed video data 18 is transmitted to the decoder (picture decoding apparatus), where a sequence of coded video images is reproduced on the basis of the received data. The configuration of the decoder pairs with that of the decoder 16 included in the video encoder shown in FIG. 1.
In a new video coding method, it is possible to use a plurality of bi-directionally predictive pictures so as to realize more efficient picture coding. For this reason, a motion estimation unit and a motion compensation unit include multi frame buffers for providing a variety of reference pictures. Information indicating individual reference image is added to a motion vector.
The internal structure of a multi frame buffer is as shown by FIG. 3, and the figure with a reference number 30 shows the whole structure. The multi frame buffer is composed of a plurality of memory areas 31 and 32 for storing frames of the video signal. The memory areas in the multi frame buffer 30 are divided into two different kinds of memory areas, that is, a short term picture memory area 33 mainly for storing reference pictures used as a reference picture for a short term and a long term picture memory area 34 mainly for storing reference pictures used as a reference picture for a long term.
The multi frame buffer stores reference pictures selected as appropriate so as to code or decode special pictures. The procedure for storing reference pictures is divided into two processing stages, that is, (1) a stage of realigning reference pictures and (2) a stage of buffering reference pictures.
(1) The reference pictures are aligned based on the reference picture order information to be transmitted in the slice layer. Ordering reference pictures has influence on the coding or decoding processing of a group of macro blocks included in one slice. The aim of this processing is to reduce the number of bits of information that indicates a reference picture to be referred to at the time of motion compensation by assigning a smaller number to a picture to be frequently referred to, in other words, by assigning a reference number with a shorter signal length to a picture with a smaller number.
(2) As for buffering reference pictures, buffering pictures to be coded or decoded is controlled when updating reference pictures stored in the multi frame buffer for each coding or decoding processing.
With the aim of buffering reference pictures, one of the two different kinds of memory management control mode, that is, “a shift window buffering mode” or “adaptive memory control buffering mode” can be used.
In the shift window buffering mode, the pictures as targets of each coding or decoding are stored in the multi frame buffer. The picture in the short term picture memory area of the multi frame buffer is periodically replaced by a new picture in a First-In First-Out (FIFO) method. There is no need to delete any picture data so as to store pictures under processing as long as the buffer has sufficient capacity of an unused memory area. If the unused area of the multi frame buffer becomes full with new picture data that have already been processed, the stored picture data are being replaced by picture data of new pictures under coding or decoding in the order of storage.
In the adaptive memory control buffering mode, they are stored in the multi frame buffer, or each picture to be deleted from the place is explicitly selected. Memory control is performed according to the memory management control processing parameter that enables memory management control in the coding side and the decoding side that are correlated with each other. In order to perform replacement processing of such pictures, a unique identification number for explicitly specifying a picture to be coded or decoded is assigned to each memory area. Note that an index indicating the picture order after realigning reference pictures of the above-mentioned (1) is assigned to each memory area, and the index is called reference index.
Several problems are left in the above memory management control mode. In the conventional memory management control mode, it is impossible to efficiently process especially interlace video data. The interlace video data comprises frames composed of two fields (a top field and a bottom field), each of which has different time and a different vertical spatial location. Pictures may be coded field by field, which leads to a problem of making memory management complicated.