Video image compression coding is a booming field of research currently. In the past two decades, the video compression coding technology has been developing continuously, and new video compression coding standards are emerging. The Motion Picture Experts Group (MPEG)-1 standard formulated by the MPEG organization in 1991 is oriented to the Video Compact Disk (VCD) applications, and achieves a great success in the Chinese market. The MPEG-2 standard jointly formulated by the MPEG and the ITU in 1994 is oriented to the applications of the digital television broadcast and Digital Video Disks (DVDs). The MPEG-2 standard is a video compression standard that has been the most widely applied, the most mature, and the most influential in the digital video broadcast and laser disk field so far. Afterward, the MPEG launches the MPEG-4, which is an object-oriented new-generation video compression coding standard; the ITU launches the H.263 standard oriented to video conferences and video communication, and the subsequent H.263+ and H.263++. Currently, the latest video compression coding standards include the H.264/AVC (Advanced Video Coding) standard jointly formulated by the ITU and the MPEG, and the VC-1 (Video Codec 1) standard formulated by Microsoft. The H.264/AVC standard was promulgated by the ISO/IEC/ITU standardization organization as an international standard in March 2005, and the VC-1 standard was promulgated by the SMPTE standardization organization in April 2006. The trends of the video compression coding technology are: higher coding compression efficiency, better network compatibility, better user experience, and a wider application field.
In order to obtain high coding compression efficiency, the current video compression coding technologies strive to remove the redundant information within an image and between images, including redundancy of time, space, statistics and human eye vision. For example, in the H.264 standard, the coding efficiency is improved through multiple technologies, including fully reversible integer conversion, multi-reference image prediction, multi-mode intra-frame prediction, Variable Block Size Motion Compensation (VBSMC), ¼ pixel interpolation, deblocking loop filter, efficient entropy coding, and so on.
The multi-reference image prediction technologies relate to the intra-frame coded image, inter-frame coded image, and Group of Picture (GOP). The intra-frame coded image is coded by the image itself, without using other images as reference. The intra-frame coded image may be coded through an intra-frame prediction technology. For example, the I-frame is an intra-frame coded image. The inter-frame coded image is coded through an inter-frame prediction technology, and predictive coding needs to be performed for the image according to a reference image. The inter-frame coded image includes two types: forward predictive coded image, and bidirectional predictive coded image. A bidirectional predictive coded image is an image that performs inter-frame predictive coding in both the forward direction and the backward direction. There may be one or more reference images in the forward direction or backward direction. For example, the P-frame is a forward predictive coded image, and the B-frame is a bidirectional predictive coded image. A GOP is a group of coded images, and is composed of one intra-frame coded image and multiple inter-frame coded images subsequent to this intra-frame coded image. A GOP header may be used to assist random access and edit.
A reference image is an image used by an inter-frame coded image as a reference. An inter-frame coded image requires a reference image before performing inter-frame predictive coding. Likewise, a reference image is also required for decoding the inter-frame coded image at the decoder. A reference image is also called a key image. A reference image may be an intra-frame coded image such as I-frame, or an inter-frame coded image, such as P-frame, but cannot be a B-frame.
A non-reference image is an image not used by any other image as a reference. In some applications, the non-reference image may be discarded, and may be applied to support scalability on the time axis. Here, the non-reference image refers to the bidirectional predictive coded image, namely, B-frame.
In the case that the multi-reference image prediction technology is applied, random access may lead to loss of the reference image.
Random access refers to a capability of decoding a bit stream and recovering the decoded image, where the decoding starts from a point other than the start point of the bit stream. Random access may includes two types: instant random access, where correct decoding of the bit stream starts from the cut-in point of the bit stream; and gradual random access, where a time of period is required between the cut-in point of the bit stream and the point of bit stream that can be correctly decoded. Random access is directly related to user experience. The situations that require random access may include: program channel shift, bit stream switching, editing and splicing, random positioning of program playback, fast-forward, fast-backward, etc. Different services impose different requirements on the random access performance. For example, for a broadcast service, the Digital Video Broadcasting (DVB) standard stipulates that a random access cut-in point needs to occur every other 0.5 s; for video communication, videoconference, and Pay Per View (PPV), lower requirements are imposed on the random access performance. In order to support random access, the video bit stream requires a certain amount of redundant information. Therefore, a contradictory relation exists between the performance of random access and the coding efficiency. Video coding standards need to achieve a tradeoff between the performance and the efficiency.
In the case of multi-reference images, the P-frame may bypass the I-frame to use the image before the I-frame as reference image. However, if the image before the I-frame is erroneous, the I-frame is unable to completely prevent error spreading. Moreover, when random access occurs on the I-frame, the image before the access point is unavailable but the inter-frame coded image needs to be decoded by using the reference image before the access point. Therefore, the decoding of the image after the cut-in point is impossible.
The loss of the reference image may occur for other reasons, for example, edit (cut or splicing) of the bit stream. Unavailability of all reference images before the cut-in point or change of the image content may make it impossible to decode subsequent images or lead to decoding errors. Further, the reference image may be lost during the transmission process. For instance, when bit stream is transmitted on a channel with errors, abrupt errors or accumulated errors may make it impossible to decode the image correctly.
The H.264 standard puts forward two solutions to prevention of error spreading caused by loss of the reference image.
I. An Instantaneous Decoder Refresh (IDR) image is introduced. The IDR image is a new image type, and is an intra-frame coded image. The images after the IDR image do not use the images before the IDR image as reference. The IDR image and all images after it can be decoded correctly. The first image in a video sequence should be an IDR image. The IDR image may serve as a random access cut-in point. However, in the process of implementing the present disclosure, the inventor finds at least the following defects in the foregoing method:
Because the images after the IDR image do not use the images before the IDR image as reference, so the images after the IDR image at the random access cut-in point are unable to make full use of the multi-reference image technology. Therefore, the coding efficiency may be reduced.
II. The Gradual Decoding Refresh (GDR) technology based on an isolated area is introduced. The intra-frame macro block refreshing technology based on an isolated area is applied, and random access can be achieved in the case of multi-reference images. The random access cut-in point may be the P-frame or B-frame. However, in the process of implementing the present disclosure, the inventor finds at least the following defects in the foregoing method:
The use of the isolated area and the intra-frame macro block is restricted, and many restrictions are imposed on the coding tools such as loop filter, intra-frame or inter-frame prediction, and even scanning. Consequently, the coding efficiency is reduced. Compared with the IDR technology, the foregoing method leads to loss of more coding gain.
The Audio Video coding Standard (AVS) also adopts the multi-reference image technology, and allows using two frames as reference frames in the forward direction. Moreover, the reference features of subsequent images are restricted through a sequence header. That is, the prediction reference feature of the first P-frame after the first I-frame is restricted in order to implement random access, wherein the I-frame is after the sequence header. However, in the process of implementing the present disclosure, the inventor finds at least the following defects in the foregoing method:
Because the prediction reference feature of the first P-frame after the I-frame after the sequence header is restricted, the coding efficiency is reduced.
It can be seen from the above description that, the coding efficiency is low when random access is implemented through technologies in the prior art.