Digital still cameras and digital video cameras are popular optical devices used for capturing static images and videos. These devices contain an image sensing device, such as a charge coupled device (CCD), which is used to capture light energy focussed on the image sensing device that is indicative of a scene. The captured light energy is processed to form a digital image. Various formats are used to represent such digital images, or videos. Formats used to represent video include JPEG, JPEG2000, Motion JPEG, Motion JPEG2000, MPEG1, MPEG2, MPEG4 and H.264.
All the formats listed above have in common that the formats are compression formats. While these formats offer high quality and improve the number of images that can be stored on a given media, the formats typically suffer from long encoding runtime. For a conventional format, such as JPEG, JPEG2000, Motion JPEG, Motion JPEG2000, MPEG1, MPEG2, MPEG4 and H.264, the encoding process is typically five to ten times more complex than the decoding process.
Multi-view images (and videos) typically refer to a set of overlapped images capturing a scene from different view positions. One or more cameras can be employed to take multi-view images. One common approach in compressing multi-view images is to encode the images (and videos) from different viewpoints independently using the aforementioned compression schemes. However, this approach does not exploit the correlation between different views and often results in an enormous amount of redundant data to be transmitted or stored in a storage device. An alternative approach is to exploit the disparity correlation between different views at the encoder. This is equivalent to performing motion estimation at the encoder in conventional video coding. With this encoding approach, a joint encoder reads the captured images from all different viewpoints and performs inter-view predictive coding on the captured images. This coding scheme can achieve high coding efficiency at the expense of a high computational complexity encoder.
Wyner-Ziv coding, or “distributed video coding”, has recently been extended to stereo and multi-view imaging to address the shortcomings of conventional approaches. In a distributed video coding (DVC) scheme, the complexity is shifted from the encoder to the decoder. Typically, the set of input images taken at different viewpoints is usually split into two subsets. The first subset of images is compressed using a conventional coding scheme, such as JPEG, JPEG2000, Motion JPEG, Motion JPEG2000, MPEG1, MPEG2, MPEG4 and H.264, and the decoder conventionally decodes the images. On the other hand, the second subset of images is encoded by channel coding methods and is predicted at the decoder from the conventionally encoded images. Such prediction processing is equivalent to carrying out inter-view disparity estimation, which is typically performed at a multi-view image encoder in conventional schemes. Then, the visual quality of the predicted images is further improved using parity information provided by the encoders.
One multi-view system using the distributed source coding paradigm has been developed based on an earlier DVC system using only one camera. This system first decodes all the views by performing a block-wise motion search in the temporal dimension. For each block that was not successfully decoded, the decoder then performs a disparity search on the available reconstruction using epipolar geometry correspondences.
Another technique suggests using disparity vector fields estimated on previously decoded frames to decode the current Wyner-Ziv frame. The system comprises a conventional camera to provide a view of a scene for disparity estimation and to be used for side information for Wyner-Ziv decoding. Alternatively, the system decodes all the views independently using temporal side information. If temporal side information estimation fails, a disparity search is then performed on the available reconstruction.
These foregoing techniques and systems assume all images are captured at a fixed resolution and may perform sub-optimally when this condition is violated.