The 360-degree video, also known as immersive video is an emerging technology, which can provide “feeling as sensation of present”. The sense of immersion is achieved by surrounding a user with wrap-around scene covering a panoramic view, in particular, 360-degree field of view. The “feeling as sensation of present” can be further improved by stereographic rendering. Accordingly, the panoramic video is being widely used in Virtual Reality (VR) applications.
Immersive video involves the capturing a scene using multiple cameras to cover a panoramic view, such as 360-degree field of view. The immersive camera usually uses a set of cameras, arranged to capture 360-degree field of view. Typically, two or more cameras are used for the immersive camera. All videos must be taken simultaneously and separate fragments (also called separate perspectives) of the scene are recorded. Furthermore, the set of cameras are often arranged to capture views horizontally, while other arrangements of the cameras are possible.
FIG. 1 illustrates an exemplary processing chain for 360-degree spherical panoramic pictures. The 360-degree spherical panoramic pictures may be captured using a 360-degree spherical panoramic camera, such as a 3D capture device. Spherical image processing unit 110 accepts the raw image data from the 3D capture device to form 360-degree spherical panoramic pictures. The spherical image processing may include image stitching and camera calibration. The spherical image processing is known in the field and the details are omitted in this disclosure. An example of a 360-degree spherical panoramic picture from the spherical image processing unit 110 is shown as picture 112 in FIG. 1. The top side of the 360-degree spherical panoramic picture corresponds to the vertical top (or sky) and the bottom side points to ground if the camera is oriented so that the top points up. However, if the camera is equipped with a gyro, the vertical top side can always be determined regardless how the camera is oriented. In the 360-degree spherical panoramic format, the contents in the scene appear to be distorted. Often, the spherical format is projected to the surfaces of a cube as an alternative 360-degree format. The conversion can be performed by a projection conversion unit 120 to derive the six face images 122 corresponding to the six faces of a cube. On the faces of the cube, these six images are connected at the edges of the cube.
In order to preserve the continuity of neighboring cubic faces sharing a common cubic edge, various cubic face assembly techniques have been disclosed in an related U.S. Non-provisional patent application Ser. No. 15/390,954, files on Dec. 27, 2016, with some common inventors and the same assignee. The assembled cubic-face frames may help to improve coding efficiency. Accordingly, cubic face assembler 130 is used to assemble the six cubic faces into an assembled cubic-face frame. The assembled image sequence is then subject to further processing. The cubic face assembler 130 may generate fully connected cubic-face frames or partially connected cubic-face frames. Since the 360-degree image sequences may require large storage space or require high bandwidth for transmission, video encoding by a video encoder 140 may be applied to the video sequence consisting of a sequence of assembled cubic-face frames. At a receiver side or display side, the compressed video data is decoded using a video decoder 150 to recover the sequence of assembled cubic-face frames for display on a display device (e.g. a 3D display). Information related to the assembled cubic-face frames may be provided to the video encoder 140 for encoding efficiently and/or properly and rendering appropriately.
FIG. 2 illustrates an example of the project conversion process to project a spherical panoramic picture into six cubic faces on a cube 210. The six cubic faces are separated into two groups. The first group 220 corresponds to the three cubic faces, labelled as 3, 4 and 5, that are visible from the front side. The second group 230 corresponds to the three cubic faces, labelled as 1, 2 and 6, that are visible from the back side of the cube.
In conventional video coding or processing, the coding or processing system always assumes the input video sequence. Therefore, the cubic faces are further assembled into cubic-face frames. FIG. 3A illustrates two examples cubic-face assembled frames (310 and 320) with blank areas, where two sets of fully interconnected cubic faces correspond to two different way of unfolding the six faces from the cube. The unfolded cubic faces (also called a cubic net) are fitted into a smallest rectangular frame with blank areas filled with dummy data.
FIG. 3B illustrates examples of another type of cubic-face assembling, where the six faces are assembled into a rectangular frame without blank area. In FIG. 3B, frame 330 corresponds to a 1×6 assembled cubic frame, frame 340 corresponds to a 2×3 assembled cubic frame, frame 350 corresponds to a 3×2 assembled cubic frame and frame 360 corresponds to a 6×1 assembled cubic frame. As shown in FIG. 3B, the six cubic faces are compactly fitted into a rectangle without any blank area.
FIG. 4A illustrates an exemplary block diagram of a video encoder system, such as HEVC (High Efficiency Video Coding), incorporating adaptive Inter/Intra prediction. The system includes two prediction modes: Inter prediction 420 and Intra prediction 430. The Inter Prediction 420 utilizes motion estimation (ME) and motion compensation (MC) to generate temporal prediction for a current frame 410 based on previous reconstructed picture or pictures. The previous reconstructed pictures, also referred as reference pictures, are stored in the Frame Buffer 480. As is known in the field, the ME for the Inter prediction uses translational motion model, where the motion can be specified by an associated motion vector. The Intra prediction 430 generates a predictor for a current block by using reconstructed pixels at neighboring blocks in the same slice or picture. A switch 445 is used to select among Inter prediction 420 and the Intra prediction 430. The selected prediction is subtracted from the corresponding signal of the current frame to generate prediction residuals using an Adder 440. The prediction residuals are processed using DCT (Discrete Cosine Transform) and Quantization (DCT/Q) 450 followed by Entropy Coder 460 to generate video bitstream. Since reconstructed pictures are also required in the encoder side to form reference pictures. Accordingly, Inverse Quantization and Inverse DCT (IQ/IDCT) 452 are also used to generate reconstructed prediction residuals. The reconstructed residuals are then added with the prediction selected by the switch 445 to form reconstructed video data using another adder 442. In-loop Filtering 470 is often used to reduce coding artifacts due to compression before the reconstructed video is stored in the Frame Buffer 480. For example, deblocking filter and Sample Adaptive Offset (SAO) have been used in HEVC. Adaptive Loop Filter (ALF) is another type of in-loop filter that may be used to reduce artifacts in coded images.
FIG. 4B illustrates an example of decoder system block diagram corresponding to the encoder in FIG. 4A. In FIG. 4A, the encoder side also includes a decoder loop to reconstruct the reference video at the encoder side. Most decoder components are used in the encoder side already except for the Entropy Decoder 461. Furthermore, only motion compensation is required for Inter prediction decoder 421 since the motion vectors can be derived from the video bitstream and there is no need for searching for the best motion vectors.
As shown in FIG. 4A and FIG. 4B, a coding system often applies filtering to the reconstructed image in order to enhance visual quality by reducing the coding artifacts. In other video processing systems, filtering may also be applied to the underlying frames to reduce noise or to enhance image quality. However, the assembled frames converted from 3D source video may contain some special features that may cause artifacts or reduce coding efficiency during conventional filtering. According, the present invention addresses filtering issues associated with assembled cubic frames.