Many video compression standards, e.g., H.264, have been widely used in real time video transmission systems to satisfy network bandwidth requirements. One of the major challenges in such systems is to stop error propagation and recover from a transmission error. In typical solutions, a video encoder embeds a “sync point” in the video stream when a transmission error is detected. A sync point is a coded frame that codes video either from a known good starting point between the encoder and decoder, if one exists, or from scratch (e.g., an instantaneous decoder refresh (“IDR”) frame). Thus, the sync frame stops error propagation. However, this approach works only when the sync point can be received and processed properly by a video decoder, which does not always occur in practical communication environments. There are various scenarios in which error propagation cannot be stopped even when a sync frame is properly received by a decoder and acknowledged to the encoder.
In modern compression standards, video decoders typically differentiate reference frames from each other through frame index information that is carried within the frame/slice syntax and serves to label each reference frame. This information consists usually of a couple of numbers that serve to identify the display and coding order of the frame. For example, in H.264, the “frame_num” parameter, primarily, and the picture order count (“POC”), secondarily, are the numbers used for this purpose. For the HEVC standard currently being developed by the ITU-T and ISO/IEC, a POC field is used for the same purpose. To save bits, encoders and decoders constrain the number of bits used to identify these reference frames. For example, in H.264, using 8 bits for the frame_num parameter allows the parameter to take any value from 0 to 255. Once the index reaches 256, it wraps around and takes the value 0.
When network errors occur, it is possible that a large continuous segment of video frames will be dropped, either in the network transmission layer or the media transfer layer. In one scenario, this may be caused by severe network errors affecting the network transmission layer. To reduce bandwidth, coding protocols do not require all coded reference frames to be acknowledged by the transmission layer and, therefore, an encoder may not have an accurate estimate of the state of reference picture buffer at the receiver side. In another scenario, the network error may be moderate, resulting to losses of single isolated frames in the network transmission layer. However, at the sender side, unless a new sync frame is produced by the encoder so that the error can be stopped, all the frames in the transmission buffer that follow the problematic frame in coding order will be continuously dropped since they refer to an erroneous reference frame. In a different scenario, the network error may be moderate and a segment of frames following a lost frame may have been sent out from the sender by the time the encoder is notified of the error (since the sender is informed of the transmission error with some delay). However, at the receiver side, since the previous frame was flagged as lost, the receiver may drop all following frames in coding order, even if they are correctly received, since they cannot be properly reconstructed due to the loss of the reference frame. Thus, a large segment of frames could be dropped even when the network error condition is moderate. The encoder keeps on encoding and updating the frame index, e.g. incrementing, even if these frames never reach the receiver side. Depending on the number of frames that are coded between the time instance the network error occurred and the generation of the sync point the frame index may get recycled.
If an encoder codes a sync frame in response to a transmission error, it becomes possible that the new sync frame may have the same index as one of the reference frames that is stored already in the decoder reference buffer or a frame that has been received, was not dropped at the media transfer layer, but has not yet been decoded. Coding standards do not specify how to handle the situation when two reference frames “collide,” they have the same frame index in the decoder reference buffer. Nor do coding standards specify decoding behavior when two consecutively received frames have the same index. Thus, the behavior of decoders in such cases is undefined and unpredictable. This could lead to corruption of the sync frame and future frames that were supposed to predict from the sync frame at the receiver side. Thus, the purpose of sending a sync frame is defeated.
If a video encoder receives an acknowledgement indicating that the sync frame has been properly passed to the decoder, the encoder/sender may not send/request further sync frames. The encoder may resume ordinary coding processes notwithstanding the fact that its estimate of the decoder's reference buffer state is inaccurate. Since subsequent video frames would be predicted from the sync frame, errors will propagate throughout the rest of the video session at the receiver side as the system operates under the ‘assumption’ there is no error.