This invention relates to a data processing system arranged to perform a resynchronisation mechanism for a decoder.
Real-time streaming of multimedia content over the internet has become an increasingly common application in recent years. A wide range of multimedia applications, such as on-demand TV, live TV viewing, video conferencing, net meetings, video telephony and many others rely on end-to-end streaming solutions. Unlike a “downloaded” video file, which may be retrieved first in “non-real” time and viewed or played back later, streaming video applications require a video source to encode and to transmit a video signal over a network to a video receiver, which must decode and display the video signal in real time. The receiving device receives encoded video data packets from the network and transfers the packets to a video decoder for decoding.
Compression techniques for transmitting video data can use so-called reference frames. When compressing blocks of video data, the encoding process can generate intra frames (I-frames). An I-frame is a compressed version of a frame which can be decompressed using only the information in the I-frame itself, and without reference to other frames. They are sometimes referred to as key frames. Another type of frame can also be generated, which are sometimes referred to as inter or predictive frame (P-frames), which are generated by predictive inter frame coding based, directly or indirectly, on a reference frame. The reference frame can be the preceding frame, or it could be a different earlier or later frame in a sequence of frames.
During the process of I-frame and P-frame coding, information in the image is processed block-wise and a Discrete Cosine Transform (DCT) is applied to each block. The resulting DCT coefficients consist of coefficients corresponding to the strength of various frequencies in the block. The coefficients of each block are quantized with various levels, so as to achieve the desired trade-off of quality and bit-rate. The encoder then reconstructs the quantized frame by applying inverse quantization and inverse DCT in the exact same way as a decoder would, and uses it as a potential reference frame for subsequent frames. The replication of the decoder functionality helps keep an encoder and decoder synchronized.
Problems can arise when a streaming video signal is transmitted across networks, such as the Internet. For example, significant packet loss rate across the transmission network often requires re-transmission of the lost packets. Typically, a lost data packet needs to be recovered prior to the time the corresponding frame must be decoded. If the lost packet is not received, the current frame being processed as well as the subsequent frames can be adversely affected because of the predictive coding.
Loss of packets can cause a loss in the synchronisation of the states of the encoder and decoder. This can cause artifacts in the displayed frames due to corrupted reference frame(s) arising due to the missing packets of the encoded frames. Due to real-time play constraints in some applications such as video telephony, the decoder may have to decode an incomplete frame and thus it will lose state synchronization with the encoder. After the incomplete frame has been decoded, any packets belonging to the decoded frame that arrive subsequent to decoding of the frame, e.g. due to late arrival or FEC repair or retransmitted packets, cannot help regain decoder state synchronization.
Typically, re-synchronization of the decoder state can be achieved by requesting an Instantaneous Decoder Refresh (IDR), enabling a Reference Picture Selection (RPS) feature or waiting for a key frame.
An IDR can be requested by a receiver, which when received without loss and subsequently decoded, completely synchronizes the encoder-decoder state. However, frequent IDR requests can lead to bandwidth being exhausted leading to poor quality, frame freezes and violations of constant bitrate constraints for some applications such as video telephony.
The RPS feature allows a receiver, upon encountering an incomplete frame, to indicate to an encoder that one of its previously received correct frames is to be used as a new reference frame for the next frame in encoder pipeline. However, RPS suffers from issues when the round trip delay between the encoder and decoder is greater than the duration for which reference frames are stored in encoder. RPS also tends to reduce encoder efficiency due to prediction from older frames.
Key frames, which can be protected by forward error correction (FEC), ensure regular encoder-decoder synchronization, provided they do not have missing packets. Packet losses in non-key frames will temporarily cause loss of synchronization until the next key frame. However, key frames tend to reduce encoding efficiency due to prediction from older frames. Furthermore, the FEC feature is proprietary and is not supported in standards.
Other approaches used to solve the decoder state synchronization problem include halting decoding until frame completion. This mechanism for incomplete frame handling involves waiting for retransmitted/repair packets to arrive before continuing to decode the incomplete frame. However, this causes the displayed video to be paused and appear stuttered. Another approach would be to increase the buffering delay. This estimates the typical missing packet retransmission time and increases the buffering delay. However, this approach leads to loss in real-timeliness and interactivity, deeming it unsuitable for applications such as video telephony and video conferencing.
Furthermore, in Multipoint Control Unit (MCU) centric video conferencing, where multiple receivers can experience packet loss at different times, frequent IDR frames may be generated, leading to poor quality. Decoder synchronization by RPS selection also may not be feasible with a single encoder instance, unless multiple encoder instances, specific to each end-point are used. In MCU-less video conferencing, where one sender sends the same packets to multiple receivers, it may be sub-optimal to generate IDR frames for each receiver's packet loss. In live video User Datagram Protocol (UDP) streaming to multiple receivers, similar real-time constraints and issues also apply.