The present invention relates to video coding systems and, in particular, to techniques that maintain synchronization between encoder and decoders in such video coding systems in the presence of transmission errors.
In video coding systems, a video encoder may code a source video sequence into a coded representation that has a smaller bit rate than does the source video and, thereby, may achieve data compression. The encoder may code processed video data according to any of a variety of different coding techniques to achieve compression. One common technique for data compression uses predictive coding techniques (e.g., temporal/motion predictive coding). For example, some frames in a video stream may be coded independently of other frames (I-frames) and some other frames (e.g., P-frames or B-frames) may be coded using other frames as prediction references. P-frames may be coded with reference to a single previously-coded frame (called, a “reference frame”) and B-frames may be coded with reference to a pair of previously-coded reference frames. The resulting compressed sequence (bit stream) may be transmitted to a decoder via a channel. To recover the video data, the bit stream may be decompressed at the decoder by inverting the coding processes performed by the coder, yielding a recovered video sequence.
A variety of coding protocols such as H.264 and the forthcoming HEVC (current under draft as ITU-T doc. JCTVC-J1003_d7), define processes that develop coding states during a video coding session. That is, as frames are coded in their respective I-, P- or B-frame coding modes, they may be coded using parameters that are selected with reference to a coding state that is defined by previously-coded frames. Thus, proper coding and decoding of a given frame may rely on the coding state that is established by other frames. When frames are lost, for example, due to transmission errors that arise between the encoder and the decoder, synchronization of state can be lost.
Modern coding protocols define classes of frames, called Random Access Pictures (“RAP frames” herein) that reset coder states to known values. Examples of RAPs are Instantaneous Decoding Refresh (“IDR”) pictures in H.264/AVC, and IDR pictures, Broken Link Access (“BLA”) pictures and Clean Random Access (“CRA”) pictures in HEVC. Typically, each RAP frame defines boundaries between other coding structures supported by the protocol such as Groups of Pictures (“GOPs”). Upon receiving a RAP frame, a decoder clears its reference picture cache, resets internal states such as frame numbers, picture order counts, temporal motion vector prediction caches, etc., based on a specified schedule. If such RAP frames are lost during coding sessions, it can cause disastrous mismatches between the encoder's state and the decoder's state, as the encoder codes new frames based on an expectation that the decoder will have processed the RAP frame properly (e.g., cleared the reference picture cache and reset its states). When a RAP picture is lost and a decoder receives subsequent frames after the RAP, it can continue decoding these frames with the wrong states. In real-time video communication applications, therefore, the use of RAP pictures in the middle of the transmission usually is avoided.
RAPs can be valuable as they can be used to reset video coding sessions from any previous mismatch, or invalid states inside the decoder. Therefore, the inventors perceive that it would be useful to engage in coding protocols in which an encoder may transmit RAP pictures in the middle of a coding session, without suffering from serious consequences when RAPs are lost during transmission.