In the proposed panoramic stereo video system, the stereo panorama videos are shown on head mounted displays (HMD) to provide an immersed 3D experience. Two essential features that determine user experience are resolution and persistence of the stereo video. In the proposed system, the stereo videos are stitched from 16 high resolution (HD) cameras, and the resolution is at least 3840×2160 (4K) for each view. With a frame rate of 50 fps, the proposed system can substantially reduce motion blurring and flicking affects. However, the super high resolution and high refresh rate generate tremendous amount of video data, which is a challenge for 3D video services and broadcasting.
Modern hybrid video coding methods, such as H.264, VC-1 and HEVC, have achieved significant improvement in video coding efficiency in the last decade. Spatial and temporal redundancy in video sequences has been dramatically decreased by employing intensive spatial-temporal prediction. Recent 3D extensions, such as MV-HEVC and 3D-HEVC, have further investigated disparity prediction between different views. However, to achieve better compression performance for stereo panorama videos, human visual characteristics and panorama-specific characteristics need to be further considered to improve subjective video quality.
Generally speaking, 360-degree panoramic image contains an elongated field of view, and there is a high probability that most of the field of view is background. Users are more likely to pay attention to only a small part of field with significant contrast of color, texture, movement, or depth.
The basic idea behind human visual characteristics based compression method is to only encode a small number of selected attention regions with high priority to obtain a high subjective video quality, while treating less interesting regions with low priority to save bits. To achieve this, an attention prediction method is often used to predict which regions that the user will likely to pay attention to.
Currently, 2D image saliency computation mainly considers the features contrast, such as color, shape, orientation, texture, curvedness, etc. In image sequences or videos, region of interesting detection is focused on motion information to separate the foreground from the background. However, the current compression methods for videos are not suitable for stereo videos, as they do not consider the stereopsis contracts in the stereo videos. Moreover, when salient objects do not exhibit visual uniqueness in spatial and movement in temporal, the ROI become challenging for existing methods to detect.
Therefore, there is a need to provide a new compression method for stereo videos where the texture, motion and stereopsis contrast are explored in the same time for saliency analysis.