Published video coding standards include ITU-T H.261, ITU-T H.263, ISO/IEC MPEG-1, ISO/IEC MPEG-2, and ISO/IEC MPEG-4 Part 2. These standards are herein referred to as conventional video coding standards.
There is a standardization effort going on in a Joint Video Team (JVT) of ITU-T and ISO/IEC. The work of JVT is based on an earlier standardization project in ITU-T called H.26L. The goal of the JVT standardization is to release the same standard text as ITU-T Recommendation H.264 and ISO/IEC International Standard 14496-10 (MPEG-4 Part 10). The draft standard is referred to as the JVT coding standard in this paper, and the codec according to the draft standard is referred to as the JVT codec.
The optional reference picture selection mode of H.263 and the NEWPRED coding tool of MPEG-4 Part 2 enable selection of the reference frame for motion compensation per each picture segment, e.g., per each slice in H.263. Furthermore, the optional Enhanced Reference Picture Selection mode of H.263 and the JVT coding standard enable selection of the reference frame for each macroblock separately.
Reference picture selection enables many types of temporal scalability schemes. FIG. 1 shows an example of a temporal scalability scheme, which is herein referred to as recursive temporal scalability. The example scheme can be decoded with three constant frame rates. FIG. 2 depicts a scheme referred to as Video Redundancy Coding, where a sequence of pictures is divided into two or more independently coded threads in an interleaved manner. The arrows in these and all the subsequent figures indicate the direction of motion compensation and the values under the frames correspond to the relative capturing and displaying times of the frames.
In conventional video coding standards, the decoding order of pictures is the same as the display order except for B pictures. A block in a conventional B picture can be bi-directionally temporally predicted from two reference pictures, where one reference picture is temporally preceding and the other reference picture is temporally succeeding in display order. Only the latest reference picture in decoding order can succeed the B picture in display order (exception: interlaced coding in H.263 where both field pictures of a temporally subsequent reference frame can precede a B picture in decoding order). A conventional B picture cannot be used as a reference picture for temporal prediction, and therefore a conventional B picture can be disposed of without affecting the decoding of any other pictures.
The JVT coding standard includes the following novel technical features compared to earlier standards:                The decoding order of pictures is decoupled from the display order. The picture number indicates decoding order and the picture order count indicates the display order.        Reference pictures for a block in a B picture can either be before or after the B picture in display order. Consequently, a B picture stands for a bi-predictive picture instead of a bi-directional picture.        Pictures that are not used as reference pictures are marked explicitly. A picture of any type (intra, inter, B, etc.) can either be a reference picture or a non-reference picture. (Thus, a B picture can be used as a reference picture for temporal prediction of other pictures.)        A picture can contain slices that are coded with a different coding type. In other words, a coded picture may consist of an intra-coded slice and a B-coded slice, for example.        
Decoupling of display order from decoding order can be beneficial from compression efficiency and error resiliency point of view.
An example of a prediction structure potentially improving compression efficiency is presented in FIG. 3. Boxes indicate pictures, capital letters within boxes indicate coding types, numbers within boxes are picture numbers according to the JVT coding standard, and arrows indicate prediction dependencies. Note that picture B17 is a reference picture for pictures B18. Compression efficiency is potentially improved compared to conventional coding, because the reference pictures for pictures B18 are temporally closer compared to conventional coding with PBBP or PBBBP coded picture patterns. Compression efficiency is potentially improved compared to conventional PBP coded picture pattern, because part of reference pictures are bi-directionally predicted.
FIG. 4 presents an example of the intra picture postponement method that can be used to improve error resiliency. Conventionally, an intra picture is coded immediately after a scene cut or as a response to an expired intra picture refresh period, for example. In the intra picture postponement method, an intra picture is not coded immediately after a need to code an intra picture arises, but rather a temporally subsequent picture is selected as an intra picture. Each picture between the coded intra picture and the conventional location of an intra picture is predicted from the next temporally subsequent picture. As FIG. 4 shows, the intra picture postponement method generates two independent inter picture prediction chains, whereas conventional coding algorithms produce a single inter picture chain. It is intuitively clear that the two-chain approach is more robust against erasure errors than the one-chain conventional approach. If one chain suffers from a packet loss, the other chain may still be correctly received. In conventional coding, a packet loss always causes error propagation to the rest of the inter picture prediction chain.
In the JVT coding standard, decoded pictures have to be buffered for two reasons: First, decoded pictures are used as reference pictures for predicting subsequent coded pictures. Second, due to decoupling of the decoding order from the display order, decoded pictures have to be reordered in display order.
The following example is used to explain the problem of separate buffering the present invention overcomes.
Consider the following sequence of pictures, where P is a predicted picture, BS a reference bi-predictive picture, and BN a non-reference bi-predictive picture, and the number relates to display order:
Display orderP1BN2BN3BS4BN5BN6P7. . .Decoding orderP1P7BS4BN2BN3BN5BN6
This can be decoded with three picture memories in the reference picture buffer, but when BN5 is decoded, it is not yet time to display it:
Decoded timeP1P7BS4BN2BN3BN5BN6Display timeP1BN2BN3BS4BN5BN6P7. . .
Therefore BN5 has to be stored to reorder pictures in display order.
The problem does not exist in conventional video coding standards, because the display order for all reference pictures is the same as their decoding order and because only the latest decoded reference picture has to be buffered to reorder pictures in display order if B pictures are in use. The conventional video coding standard supporting reference picture selection have a reference picture buffer but they do not have a picture buffer for display reordering.
The following straightforward proposal was proposed for the JVT coding standard: Have a picture buffer for reference pictures that is separate from a picture buffer for display reordering. Specify the maximum number of pictures separately for both buffers.
Let us reconsider the above described example again. A reference picture enters the reference picture buffer as soon as it is decoded. A non-reference picture does not enter the reference picture buffer. A decoded picture is removed from the reference picture buffer as soon as it is no longer needed for reference. For example, picture P1 can be removed after decoding of picture BN3. A picture enters the display reordering buffer as soon as it is decoded. A decoded picture is removed from the display reordering buffer when it can be displayed. The figure below shows the contents of the buffers just after decoding each picture.
De-P1P7BS4BN2BN3BN5BN6codedtimeDisplayP1BN2BN3BS4BN5BN6P7timeRef-P1P1P1P1BS4BS4P7erenceP7BS4BS4P7P7bufP7P7DisplayP1P1BS4BS4BS4BN5BN6P7bufP7P7P7P7P7P7
It can be seen that the required sizes of the reference picture buffer and the display reordering buffer are 3 pictures and 2 pictures respectively.