Field of the Invention
The invention relates to a technical field of multi-view video encoding and decoding, in particular to a virtual viewpoint synthesis method and system.
Description of the Related Art
A multi-view video refers to a set of video signals obtained by recording the same scene from different angles by multiple cameras with different viewpoints, and is an effective 3D video representation method, which can make the scene be perceived more vividly. It has extremely wide application in 3D television, free viewpoint television, 3D video conference and video monitoring.
A multi-view video records the scene from different viewpoints by a set of synchronous cameras. The video can display images of corresponding angles according to a viewer's position while being displayed. When the viewer's head moves, what the viewer sees also changes correspondingly. Thus a “circular view” effect is created.
In order to obtain a natural and smooth motion parallax effect, extremely densely arranged cameras are needed to obtain a multi-view video sequence. However, as the number of cameras increases, the data volume of the multi-view video also doubles, which is a great challenge for data storage and transmission.
Under the condition of a low code rate, in order to obtain a high-quality 3D video stream, the multi-view video generally adopts a format of double viewpoints plus depth, compresses and encodes colorful video images and depth images respectively. The video decoding adopts the virtual viewpoint synthesis technology of depth image based rendering (DIBR), uses left and right viewpoints and corresponding depth images to generate a multi-view video, and rebuilds a one angle or multi-angle 3D video according to a user's requirements (3D Video or free viewpoint video). The virtual viewpoint synthesis technology is one of the key technologies in encoding and decoding multi-viewpoint videos. The quality of composite images has a direct influence on the watching quality of multi-viewpoint videos.
A virtual viewpoint synthesis method in the prior art only has one virtual image during the entire process, and searches for candidate pixels in left and right reference viewpoints at the same time for one pixel. Therefore, the candidate pixel set of the pixel may comprise pixels from two viewpoints and all candidate pixels are used in the following weighted summation without screening. The depth image-based method depends on the accuracy of depth images greatly. The discontinuity of depth images may also influence the quality of composite views. Therefore, the conventional solutions suffer from problems either in boundary regions or in background regions.