Various products, such as digital cameras and digital video cameras, are used to capture images and video. These products contain an image-sensing device, such as a charge coupled device (CCD), which is used to capture light energy focussed of the image sensing device. The captured light energy, which is indicative of a scene, is then processed to form a digital image. Various formats are used to represent such digital images, or videos. Formats used to represent video include Motion JPEG, MPEG2, MPEG4 and H.264.
All the formats listed above have in common that they are compression formats. While those formats offer high quality and improve the number of video frames that can be stored on a given media, they typically suffer from long encoding runtimes.
A complex encoder typically requires complex hardware. Complex encoding hardware in turn is disadvantageous in terms of design cost, manufacturing cost and physical size of the encoding hardware. Furthermore, long encoding runtime delays the rate at which video frames can be captured while not overflowing a temporary buffer. Additionally, more complex encoding hardware has higher battery consumption. As battery life is important for a mobile device, it is desirable that battery consumption be minimized in mobile devices.
To minimize the complexity of the encoder, Wyner Ziv coding, also referred to as “distributed video coding”, may be used. In distributed video coding the complexity of the encoder is shifted to the decoder.
In one example of distributed video coding, the input video stream is split into key frames and non-key frames. The key frames are compressed using a conventional coding scheme, such as Motion JPEG, MPEG2, MPEG4 or H.264. The decoder decodes the key frames in a conventional manner. The key frames are also referred to as “reference frames” in this specification. With the help of the key frames the non-key frames are predicted. The processing at the decoder is thus equivalent to carrying out motion estimation, which is usually performed at the encoder. The decoder improves the visual quality of the predicted non-key frames using error correction information provided by the encoder. The predicted non-key frame is also called the side information for the error correction.
The visual quality of the decoded video stream depends heavily on the quality of the prediction of the non-key frames and the level of quantization of the key frame image pixel values. The prediction of a non-key frame is often a rough estimate of the original non-key frame, this estimate being generated from adjacent frames such as the key frame, through motion estimation and interpolation. When there is a significant mismatch between a predicted non-key frame and the associated decoded key frame, it is necessary to resolve the mismatch.
In distributed video coding both the prediction errors (ie errors in the predicted non-key frames) and the error correction failures have to be rectified. Prior art approaches address these issues by a frame re-construction function that is performed after the Wyner-Ziv decoding. If the value of a predicted pixel (ie a pixel in a predicted non-key frame) is within a specified range of the associated decoded pixel (ie the pixel in the corresponding key frame), then the value of the reconstructed pixel is made equal to the value of the predicted pixel. Otherwise, the value of the reconstructed pixel is set equal to a pre-defined upper or lower bound of the decoded pixel, depending on the magnitude of the predicted value. This approach has the advantage of minimizing decoding errors and eliminates large positive or negative errors that are highly perceptible to human eyes. However, the solution is considered to be sub-optimal.