The advances of digital video coding standards have resulted in successes of multimedia systems such as smartphones, digital TVs, and digital cameras for the past decade. After standardization activities of H.261, MPEG-1, MPEG-2, H.263, MPEG-4, and H.264/AVC, the demand for improving video compression performance has been still strong due to requirements of larger picture resolutions, higher frame rates, and better video qualities. Therefore, development of new video coding techniques for better coding efficiency than H.264/AVC has been never ending. HEVC is based on a hybrid block-based motion-compensated transform coding architecture.
Three-dimensional (3D) television has been a technology trend in recent years that intends to bring viewers sensational viewing experience. Various technologies have been developed to enable 3D viewing and the multi-view video is a key technology for 3D TV application among others. For example, the video can be a two-dimensional (2D) medium that only provides viewers a single view of a scene from the perspective of the camera. However, the multi-view video is capable of offering arbitrary viewpoints of dynamic scenes and provides viewers the sensation of realism. 3D video formats may also include depth maps associated with corresponding texture pictures. The depth maps also have to be coded to render three-dimensional view or multi-view. Due to the strong demand of improving the coding efficiency of 3D and multi-view videos caused by the requirements of coding multiple view data, larger picture resolution and better quality, various techniques have been proposed.
As an extension to HEVC and a next generation 3D video coding standard, the standardization of 3D-HEVC video coding standard was formally launched by the Joint Collaborative Team on 3D Video Coding Extension Development (JCT-3V) in July 2012 and is finalized after the 11th JCT-3V meeting held in February 2015. In order to support the auto-stereoscopic multi-view display more practically, multi-view video plus depth (MVD) format has been introduced as a new 3D video format for 3D-HEVC. The MVD format consists of a texture picture and its associated depth map. Unlike a texture picture representing the luminance and chrominance information of an object, a depth map is an image containing information relating to the distance of the objects from the camera-captured plane and is generally employed for virtual view rendering as non-visual information.
Virtual reality (VR) with head-mounted displays (HMDs) is associated with varieties of applications. The ability to show wide field of view content to a user can be used to provide immersive visual experiences. A real-world environment has to be captured in all directions resulting in an omnidirectional video corresponding to a viewing sphere. With advances in camera rigs and HMDs, the delivery of VR contents may soon become the bottleneck due to the high bitrate required for representing such content. Since the omnidirectional videos are often with 4K or higher resolution, compression is critical to reduce the bitrate. The omnidirectional videos provided are equirectangular projection. FIG. 1 illustrates an example of an image of an omnidirectional video (known as “Hangpai_2”) in the equirectangular projection format. While the original image is in full color, a black and white version is shown in FIG. 1 since the black and white image is sufficient to illustrate the present invention.
The equirectangular format can be converted into different formats as shown in FIG. 2: (a) cubemap, (b) Cubemap_32, (c) Cubemap_180, (d) Plane_poles, (e) Plane_poles_6, (f) Plane_poles_cubemap, (g) Plane_cubemap, (h) Plane_cubemap_32, (i) Flat_fixed, (j) 180 degree 3D video (i.e., 180-3D) and (k) Cylindermap/Cylindrical. Images in FIG. 2a through FIG. 2i are based on the image in FIG. 1.
FIG. 3 illustrates an example of converting the equirectangular projection format into a cubic format using projection conversion 310, where images labelled from 1 to 6 correspond to images on six faces of a cubic for representing a 360-degree video. Four commonly used layouts (i.e., 1×6-layout 410, 2×3-layout 420, 3×2-layout 430, and 6×1-layout 440) are illustrated in FIG. 4. In each layout, the images from 6 faces are assembled into one single rectangular image. FIG. 5 illustrates the geometry comparison between the equirectangular format and the cubic format. Equirectangular geometry 510 and cubic geometry 520 are shown in FIG. 5. Image 512 is an example of equirectangular format and image 522 is an example of cubic format.
In the existing approach for converting cubic faces into an output format, a same selected output layout format is always used and the six faces are assigned to the output layout format in a fixed manner. While the fixed mapping is simple, it prevents a user from using other layout format to meet the user's needs. Furthermore, after the cubic faces are converted to an output layout format, the converted output images are often compressed to reduce the required space. The selected output layout format and the fixed mapping may not be efficient for compression.