Various products, such as digital cameras and digital video cameras, are used to capture images and video. These products contain an image sensing device, such as a charge coupled device (CCD), which is used to capture light energy focussed on the image sensing device. The captured light energy, which is indicative of a scene, is then processed to form a digital image. Various formats are used to represent such digital images, or videos. Formats used to represent video include Motion JPEG, MPEG2, MPEG4 and H.264.
All the formats listed above have in common that they are compression formats. While those formats offer high quality and improve the number of video frames that can be stored on a given media, they typically suffer because of their long encoding runtime.
A complex encoder requires complex hardware. Complex encoding hardware in turn is disadvantageous in terms of design cost, manufacturing cost and physical size of the encoding hardware. Furthermore, long encoding runtime delays the rate at which video frames can be captured while not overflowing a temporary buffer. Additionally, more complex encoding hardware has higher battery consumption. As battery life is essential for a mobile device, it is desirable that battery consumption be minimized in mobile devices.
To minimize the complexity of the encoder, Wyner Ziv coding, or “distributed video coding”, may be used. In distributed video coding the complexity of the encoder is shifted to the decoder. In distributed video coding the input video stream is usually split into key frames and non-key frames. The key frames are compressed using a conventional coding scheme, such as Motion JPEG, MPEG2, MPEG4 or H.264, and the decoder conventionally decodes the key frames. With the help of the decoded key frames, the non-key frames are then predicted by the decoder. The processing at the decoder is thus equivalent to carrying out motion estimation which is usually performed at the encoder. The predicted non-key frames are improved in terms of visual quality with the information the encoder is providing for the non-key frames.
The visual quality of the decoded video stream depends heavily on the quality of the prediction of the non-key frames. A prior art approach is to improve the quality of the prediction of the non-key frames by the encoder obtaining more information by carrying out partial motion estimation, or other video analysis methods. This additional information is then additionally transmitted to support the prediction carried out by the decoder. Of course, employing video analysis methods on the encoder increases the complexity of the encoder, which is undesirable.
Another approach employed to improve the quality of the information used by the decoder to predict non-key frames is to do a spatially based motion smoothing. In this situation the Laplace distribution model is employed to represent temporal noise distribution in the smoothed frames. However it has been found that the Laplace distribution model does not properly model the temporal noise distribution. This is because the tails of the Laplace distribution approach zero slower than the actual distribution of the empirical noise.