Various products, such as digital cameras and digital video cameras, are used to capture images and video. These products contain an image sensing device, such as a charge coupled device (CCD), which is used to capture light energy focussed on the image sensing device. The captured light energy, which is indicative of a scene, is then processed to form a digital image. Various formats are used to represent such digital images, or videos. Formats used to represent video include Motion JPEG, MPEG2, MPEG4 and H.264.
All the formats listed above are compression formats which offer high quality and improve the number of video frames that can be stored on a given media. However, the above formats all have long encoding runtimes.
A complex encoder requires complex hardware. Complex encoding hardware has a high design and manufacturing cost, as well as a relatively large physical size. Furthermore, long encoding runtimes delay the rate at which video frames can be captured while not overflowing a temporary buffer. Additionally, more complex encoding hardware has higher battery consumption.
Wyner Ziv coding or “distributed video coding” is a form of coding where an input video stream is usually split into key frames and non-key frames. The key frames are compressed using a conventional coding scheme, such as Motion JPEG, MPEG2, MPEG4 or H.264, and the decoder conventionally decodes the key frames. The key frames are used to predict the non-key frames. The decoder essentially performs motion estimation which is usually performed at the encoder. The predicted non-key frames are improved in terms of visual quality with the information the encoder is providing for the non-key frames.
The visual quality of the decoded video stream depends heavily on the quality of the prediction of the non-key frames and the level of quantization to the image pixel values. The prediction is often a rough estimate of the original frame, generated from sub-sampled images and/or adjacent frames, e.g., through motion estimation and interpolation. The mismatch between the prediction and the decoded values are corrected by channel coding techniques such as the generation of parity bits. Each parity bit carries some information about one or more information bits in the original frames. The bit rate of this parity bit stream can vary to achieve a given rate-distortion performance desirable for specific applications.
Feedback channels are often employed to perform rate-distortion in distributed video coding systems. An encoder typically generates the parity bit stream and temporarily stores the generated bit stream in a buffer for later transmission. Initially, a small amount of parity bits are transmitted to the decoder for error correction. If decoding is unsuccessful, the decoder requests more parity bits from the encoder through the feedback channel, and the decoding process restarts. The decoder continues to request more parity bits until the decoder accumulates a sufficient amount of parity bits to correct the bit errors in the prediction. However, the multiple requests result in a long decoding time. Further, the decoder is relatively complex.
As an alternative to having a feedback channel, some recent distributed video coding systems implement a rate estimation function to control the bit rate of the parity bits. The rate estimation function typically involves the computation of bit error probability and conditional entropy probability per bit plane basis under the assumption of a well-defined noise distribution model (e.g., Laplacian). A disadvantage of this rate estimation function is that it relies too much on the assumption of the underlying noise distribution model. The parameters of a noise distribution model have significant impact on the performance of the system. Any inaccuracy of the estimated noise parameters may lead to poor decoding performance. For good performance, the noise parameters need to be estimated online and to be computed for each bit plane. This significantly increases the complexity of the encoder and defeats the purpose of a simple encoder behind distributed video coding.