Most of the visual content today is still in two dimensional (2D) images or videos which are in form of a sequence of images. Generally, the conventional images and videos do not support changes of viewpoints other than just magnification/scaling or simple shifting. With the advent of stereo or three dimensional display technologies, active shutter and passive polarized eye glasses are now commonly available. More recently, high resolution autostereoscopic displays, which do not require eye glasses, have become available. The input to such autostereoscopic displays is usually i) a video plus a depth map which describes the depth of each pixel in the video or ii) a set of videos at adjacent viewpoints, sometimes called multi-view videos, which are multiplexed on an image frame in a certain format. A lenticular lens or parallax barrier of the autostereoscopic displays perform a spatial filtering so that a user at a certain viewing area will be able to see two different images at his/her left and right eyes, respectively, thus creating a 3D perception.
To display conventional 2D images or videos in a 3D display device requires the generation of another view of the scene. On the other hand, the display of 3D videos on autostereoscopic displays requires either the generation of a depth map or appropriate multi-view videos to be multiplexed in the desired frame format. One method to facilitate the generation of these additional views is to augment the videos with corresponding depth maps or its approximated versions. For conventional videos, augmenting each image frame with a depth map results in additional depth video and the format is sometimes referred to as the 2D+Z representation, where Z stands for the depth value. Afterwards, view synthesis can be performed to synthesize an arbitrary view from the 2D and depth videos. Image-domain warping is one of the methods used for view synthesis. See, United States Patent Publication No. 2013/0057644 A1 of N. Stefanoski and the article, Stefanoski et al., “Automatic view synthesis by Image-Domain-Warping,” IEEE Transactions On Image Processing, vol. 22, no. 9, pp. 3329-3341, (September 2013), which are incorporated herein by reference in their entirety. Since the depth map is usually represented as depth values assigned to each of the pixels of the image frame, the size of the depth map and hence the depth video can be very large. Efficient compression of multi-view depth map images and videos is therefore important for their efficient storage and transmission. Moreover, since the conventional depth maps contain only one value at a particular location, the image-domain warping method may not be able to handle semi-transparent and reflective objects because the objects are matted with the background.