In known MPEG/H.26x video coding standards (e.g. MPEG-1, MPEG-2, MPEG-4, MPEG-4 AVC/H.264, H.263, VC-1), there are basically three types of pictures: I (intraframe coded) pictures, P (interframe coded) pictures and B (bi-directionally predicted) pictures. An I picture does not use other pictures as reference so that it can be used as re-synchronisation point in error-prone video transmission. It can also be used as random access point in video editing and fast forward/backward play. A P picture can use one or more previous pictures as reference so that it increases the coding efficiency due to the prediction. B pictures can use previous and subsequent pictures for prediction and further improve the coding efficiency.
Video sequences are usually coded in a group of picture (GOP) structure, wherein several P (P1, P2, P3) and/or B pictures are coded following one I picture, as is shown in FIG. 1. However, this GOP structure has some disadvantages especially in the following two kinds of applications:
a) Error Resilience
If picture P1 is lost e.g. due to transmission channel error, then the subsequent P pictures can not be reconstructed correctly, and the error will propagate temporally and cause some unpleasing artefacts. Although error concealment can be employed at decoder side, it can not remove the artefact very well because some vital information is lost.
b) Storage Medium Recording, e.g. on DVD or VCR DVD (digital versatile disc) or VCR (video cassette recorder) usually require functions like forward, backward, stop, pause, fast forward, fast backward and random access. However, the known MPEG GOP structure is designed for forward play only and makes complicate the reverse play operation. A simple fast backward play can be achieved by only accessing I pictures in backward direction, but if smoother picture-by-picture reverse-play is desired, much more complexity, bandwidth, and/or storage buffer will be required. For example, one can decode the GOP up to the current frame, and then go back to decode from the beginning of the GOP again up to the next frame to be displayed. However, this requires high bandwidth of throughput. Otherwise a great deal of storage buffer is needed if the bit stream is expected to be decoded only once.
Some different GOP structures have been proposed to solve the above problems. For error resilience, a video redundancy coding method has been disclosed by S. Wenger, G. Knorr, J. Ott, F. Kossentini, “Error Resilience Support in H.263+”, IEEE Transactions on Circuits and Systems for Video Technology, Vol. 8, No. 7, November 1998, for H.263+ codec applications. This method divides the video sequences into two or more chains in such a way that every picture is assigned to one of the chains. Each chain is coded independently. A GOP structure using two prediction chains is shown in FIG. 2. In case one of these chains is damaged because of a packet loss, the remaining chains stay intact and can be decoded and displayed. It is possible to continue a decoding of the damaged chain, or do some error concealment, by using the information in the other undamaged chain, which leads to only a slight subjective quality degradation. It is also possible to stop the decoding of the damaged chain, and this will only lead to a drop of the frame rate, which has much less effect on subjective quality than other error artefacts. In both cases the resulting error resilient performance is much better than for the FIG. 1 GOP prediction structure. However, this structure does not support the function of reverse-play.
For reverse replay, C. W. Lin, J. Zhou, J. Youn, M. T. Sun, “MPEG Video Streaming with VCR Functionality”, IEEE Transactions on Circuits and Systems for Video Technology, Vol. 11, No. 3, March 2001, have proposed to add a reverse-encoded bit stream in the server, i.e. in the encoding process. Upon finishing the encoding and reaching the last picture of the video sequence, the video pictures are encoded in the reverse order to generate a reverse-encoded bit stream. If the server has only the forward encoded bit stream (i.e. the original sequence is unavailable), the forward bit stream can be decoded up to two GOPs each time in the reverse direction (i.e. from the last GOP to the first GOP) and the video sequence is then re-encoded in the reverse order. The generation of the reverse-encoded bit stream is performed off-line. However, each picture is encoded twice and hence the bit stream size is almost doubled.
T. Fang, L. P. Chau, “An error-resilient GOP structure for robust video transmission”, IEEE Transactions on Multimedia, Vol. 7, No. 6, December 2005, have proposed a new GOP structure which takes both error resilience and VCR reverse-play into account. By putting the I picture (In) in the middle of each GOP, the predicted P pictures are partitioned into two parts: half of them (Pn−1, . . . , Pm+i+1) are backward-predicted encoded and half of them (Pn+1, . . . , Pn+j) are forward-predicted encoded, as shown in the corresponding GOP structure (without B pictures) in FIG. 3. The subscript is the temporal number of the picture in the original video sequence, where the subscripts are monotonously increasing as i>1, n−1>m+i+1, j>1, and k−1>n+j+1. If B pictures are included, the structure will not be virtually affected. Obviously, if one P picture is corrupt, at most it will affect only half of the GOP, while the other half of the GOP which is arranged at the other side of picture In will not be affected. In fact, this GOP structure is another form of two prediction chains, wherein one chain is forward and the other is backward.
On one hand, this GOP structure makes the reverse-play partly easy since one half of the P frames in the GOP are already reverse-encoded. On the other hand, this GOP structure still has disadvantages in both, the error resilience and the reverse-play. If Pm+1 is lost, then Pm+1 to Pm+i are corrupt and error artefacts will be noticed in this time period. Although the picture chain Pn−1 to Pm+i+1 may be received correctly, it will not offer help for decoding the pictures of the time period from Pm+1 to Pm+i. Therefore this GOP structure can not provide an error resilient performance as good as the GOP structure depicted in FIG. 2. Further, this GOP structure can not provide a continuous reverse-play function because half of the consecutive P frames are still forward encoded. In detail, the processing order for reverse-play is: Ik→Pk−1→ . . . →Pn+j+1→In→Pn−1→ . . . →Pm+i+1→Im . . . . Hence, there are gaps during Pn+j to Pn+1 and Pm+i to Pm+1 that will cause a big jitter in the reverse play. If the pictures from Pn+j to Pn+1 and Pm+i to Pm+1 really need to be displayed, then normal multi-pass decoding or huge buffering is necessary, which is the same problem like in the standard GOP structure.