A stream of video data to be encoded is illustrated schematically in FIG. 1. The stream comprises multiple frames (F) 101, 102, 103 each representing the video image at a different respective moment in time. As will be familiar to a person skilled in the art, for the purpose of encoding, each frame (F) 101, 102, 103 is divided into portions and each portion may also be subdivided into smaller sub-portions, each portion or sub-portion comprising a plurality of pixels. For example, according to one terminology each frame of a video stream to be encoded is divided into macroblocks (MB) 104 comprising multiple pixels (e.g. each macroblock 104 may be a region of 8×8 pixels.
An encoder at a transmitter encodes the video data for transmission to a receiver over a packet-based network. A decoder at the receiver is then able to decode the encoded video data. The general term for the encoding/decoding method employed is a codec.
In some systems, the decoder at the receiver may be arranged to send feedback to the encoder of the transmitter via a feedback channel, via the same packet-based network.
A goal of a video codec is to reduce the bit rate needed to transmit a video signal, while maintaining highest possible quality. This goal is achieved by exploiting statistical redundancies (similarities in the video signal) and perceptual irrelevancies (related to sensitivity of human visual system).
Most of today's video codecs are based on an architecture that includes prediction of pixel blocks from other pixel blocks, transform of prediction residuals, quantization of transform coefficients, and entropy coding of quantization indices. These steps contribute to reducing redundancies and irrelevancies.
The prediction can typically be performed from pixels in video frames other than the current frame (inter prediction) and from pixels in the same frame (intra prediction). That is, an intra-coded frame is encoded using only information in that frame itself. For example, spatial redundancies across the frame image can be exploited using known techniques such as a discrete cosine transform. Frames encoded in this way are referred to as I-frames.
An inter-encoded frame on the other hand is encoded using information from frames other than itself. That is, an inter-encoded frame may only indicate the differences between the frame and a previous frame. Hence an inter-encoded frame requires fewer bits to encode than encoding absolute pixel values and hence saves on bitrate. Inter-encoded frames may be referred to as P-frames (though other types of inter-encoded frames exist and are known in the art, only P-frames are referred to herein for the sake of clarity).
Intra prediction encoding typically requires more bits than inter prediction, though still represents a saving over encoding absolute values. Details of suitable inter and intra encoding techniques for video will be familiar to a person skilled in the art.
In a conventional system, the feedback channel may be used to enable the encoder on the transmitter to determine that a frame was experienced as lost at the receiver. There are two ways in which this may be achieved. Firstly, in a negative feedback scheme, the transmitter may signal back to the encoder on the transmitter that a frame was experienced as lost at the receiver (a loss report). Secondly, in a positive feedback scheme, the feedback channel may also be used to signal back to the encoder that a frame was successfully received at the receiver (an acknowledgement), thus the encoder may determine that a frame was lost at the receiver when it does not receive an acknowledgement. Typically, a lost frame causes severe distortions in the decoded video that can last for a long time unless actions are taken. One such action is to force the encoder to generate a “recovery frame” that will stop error propagation when received and decoded.
A frame may be deemed “lost” at the receiver when it is not successfully received and/or not successfully decoded by the receiver. Hence, a frame may be “lost” at the receiver due to packet loss on the network. Alternatively, a frame may be “lost” at the receiver due to corruption of frame data (i.e. the frame was received by the receiver, but the received frame contains data errors which result in it not being decodable), and the corruption cannot be corrected using error correction. In general, a frame (or more generally a portion) may be considered lost at the receiver if it has not been both received and decoded at the receiver.
In a negative feedback scheme the recovery frame is a key-frame (i.e. all intra coded).
In a positive feedback scheme, the encoder is informed of every successfully received frame and hence has information pertaining to the last frame successfully decoded by the decoder. Hence, in this scheme the recovery frame may also be a frame that is inter coded with respect to an error-free frame known to be available in the decoder (known to be error free because it has itself been acknowledged as received and anything else relevant in its history has been acknowledged). The latter generally results in a lower bitrate at a given quality compared to the former. The disadvantages associated with sending a recovery frame are bitrate overshoots (rate spikes) or alternatively an increase in source coding distortion. Bitrate overshoots can in turn cause new losses or forcing encoder to drop frames and a drastic increase source coding distortion might be perceptually disturbing.