Various products, such as digital cameras and digital video cameras, are used to capture images and video. These products contain an image sensing device, such as a charge coupled device (CCD), which is used to capture light energy focussed on the image sensing device. The captured light energy, which is indicative of a scene, is then processed to form a digital image. Various formats are used to represent such digital images, or to videos. Formats used to represent video include Motion JPEG, MPEG2, MPEG4 and H.264.
A common feature of the formats listed above is that they are each compression formats. While those formats offer high quality and improve the number of video frames that can be stored on a given media, they typically suffer because of their long encoding runtime.
A complex encoder requires complex hardware. Complex encoding hardware is disadvantageous in terms of design cost, manufacturing cost and physical size of the encoding hardware. Furthermore, long encoding runtime delays the rate at which video frames can be captured while not overflowing a temporary buffer. Additionally, more complex encoding hardware has higher energy consumption. As longer battery life is highly desirable for a mobile device, it is that desirable that battery energy consumption be minimized in mobile devices.
To minimize the complexity of the encoder, Wyner Ziv coding, or “distributed video coding”, may be used. In distributed video coding the complexity of the encoder is shifted to the decoder. In distributed video coding the input video stream is usually split into key frames and non-key frames. The key frames are compressed using a conventional coding scheme, such as Motion JPEG, MPEG2, MPEG4 or H.264, and the decoder operates to conventionally decode the key frames. With the help of the decoded key frames, the non-key frames are predicted. The processing at the decoder is thus equivalent to carrying out motion estimation which is usually performed at the encoder. The predicted non-key frames are improved in terms of visual quality with the information the to encoder provides for the non-key frames.
The visual quality of the decoded video stream depends heavily on the quality of the prediction of the non-key frames and the level of quantization to the image pixel values. The prediction is often a rough estimate of the original frame, generated from adjacent frames, for example through motion estimation and interpolation. When there is a mismatch between the prediction and the decoded values, some form of compromise is required to resolve the differences.
The objective of distributed video coding is to correct both prediction errors and error correction errors. Some prior art address this objective by employing a frame re-construction function after performing Wyner-Ziv decoding. If the predicted value is within range of the decoded quantized symbol, the reconstructed pixel value is made equal to the predicted value, otherwise the re-construction value is set equal to the upper bound or the lower bound of the quantized symbol, depending on the magnitude of the predicted value. This approach has the advantage of minimizing decoding errors and eliminates large positive or negative errors which are highly perceptible to human senses. However, such an approach is considered to be sub-optimal.