In video coding systems, the temporal redundancy is exploited using temporal prediction to reduce the video data to be transmitted or stored. Neighboring pictures in a video sequence often bear great similarities, and simply using picture differences can effectively reduce the transmitted information associated with static background areas. Nevertheless, moving objects or panned/zoomed scenes in the video sequence may result in substantial residues and will require higher bitrates to code the residues. Consequently, Motion Compensated Prediction (MCP) is often used to exploit temporal correlation in video sequences. In MCP systems, Intra-coding is used to transmit an initial picture (or Intra-coded picture; I-picture), and the I-pictures are inserted periodically to allow quick access to the compressed video data or to alleviate error propagation.
In older coding systems such as MPEG-1/2, the video sequence is organized into multiple GOP (group of pictures) where different types of GOP may be used for different application. A video sequence may be encoded using I-pictures only which allows full random access to the compressed video. However, while such system has low computational complexity, the coding efficiency is low. The IPPP GOP structure consists of I-pictures and Predicted pictures (P-pictures) where the P-picture is processed using forward motion prediction. The IPPP GOP structure usually achieves much better coding efficiency than the I-picture only processing. However, the computational complexity associated with the IPPP GOP structure is much higher than that for the I-picture only processing due to required motion estimation processing. A system based on the IPPP GOP structure results in low processing delay since the processing of a current picture depends on a previously coded picture only and there is no need to wait for future pictures. Consequently, the IPPP GOP structure is suited for low delay applications such as video conferencing. The IBBP GOP structure is another widely used GOP structure in the MPEG-1/2 standards. Beside I-pictures and P-pictures, the IBBP GOP structure uses one or more B-pictures between an I-picture and a P-picture, or between two P-pictures. In the MPEG-1/2 standards, the B-picture is bi-directionally predicted picture based on one past picture and one future picture in the display order. The IBBP GOP structure requires higher computational complexity due to the bi-directional motion estimation. However, the IBBP GOP structure results in further bitrate reduction over the IPPP GOP structure.
In H.264/AVC, the granularity of the establishment of prediction types is applied based on a lower level called the slice. A slice is a spatially distinct region of a picture that is encoded separately from any other region in the same picture. In H.264/AVC, I-slices, P-slices, and B-slices are used to refer to the regions coded with respective prediction types instead of I-pictures, P-pictures, and B-pictures. Typically, pictures are segmented into macroblocks, and individual prediction types can be selected on a macroblock basis. For H.264/AVC, an I-picture can contain only intra macroblocks, a P-picture can contain either intra macroblocks or predicted macroblocks, and a Bi-predictive picture (B-picture) can contain intra, predicted, or bi-predicted macroblocks. In H.264/AVC and the emerging High Efficiency Video Coding (HEVC), predicted pictures may use multiple previously-decoded pictures as references, and the predicted frames can have arbitrary display-order relationship relative to the picture(s) used for prediction. While a B-picture in the MPEG-1/2 standards is referring to a picture coded using bi-directional prediction, a B-picture in H.264 and HEVC is referring to a bi-predictive picture that can use reference pictures in both reference picture list 0 and reference picture list 1.
In H.264 and HEVC, hierarchical GOP structure, including hierarchical P GOP structure and hierarchical B GOP structure, has been used to allow temporal scalability. On the other hand, low-delay B GOP structure has also been disclosed, where all B-pictures used are low-delay B-pictures that use reference pictures from list 0 and list 1, where the reference pictures from list 0 and list 1 contain only pictures prior to the B-picture in the display order. It is desirable to develop a new GOP structure that can take advantage of the high coding efficiency and temporal scalability offered by hierarchical GOP structure and the low-delay feature of low-delay B-pictures. Accordingly, the present invention discloses a low-delay hierarchical B GOP structure.