A conventional video bears 2D image information. It presents the content of an object but ignores the depth information such as the distance and location of the object, and is incomplete. As a watcher of the video, a person needs to obtain space information more than a 2D image to have a visual experience equivalent to watching the world with two eyes of a human being.
In the 3D video technology, pictures comply with 3D visual principles of human beings and provide depth information. Therefore, the 3D video technology presents views of the external world on the screen authentically, and renders the objects of scenes in depth, hierarchically, and authentically. It is an important trend of video technologies. The depth information of a scene is important in a 3D video system. A depth image is also known as a parallax image of the scene. In the conventional art, the following methods are available for obtaining a depth image of a scene:
One method is to obtain the depth image of scenes through 3D image matching. That is, multiple color images of a scene are obtained through photographing on a camera. The color images are the 2D images of the scene. The color images are analyzed and calculated so that the depth image of the scene is obtained. The basic principles are: For a point in the scene, find the corresponding imaging point in the multiple color images; and calculate out the coordinates of this point in the space according to the coordinates of this point in the multiple color images to obtain the depth information of this point.
The 3D image matching technology includes a window-based matching method and a dynamic planning method, both employing an algorithm based on grayscale matching. The algorithm based on grayscale matching splits a color image into small subareas, uses the grayscale value as a template, and finds the subarea of similar grayscale value distribution in other color images. If two subareas meet the similarity requirement, the points in the two subareas are regarded as matching. In the matching process, the relevant functions are generally used to measure the similarity of two areas. The algorithm based on grayscale matching can obtain a depth image of dense scenes.
Moreover, 3D image matching may be performed through an algorithm based on feature matching. The algorithm based on feature matching uses the feature exported from the grayscale information of the color image to perform matching. Compared with the algorithm that uses the simple luminance and grayscale change information to perform matching, the algorithm based on the feature matching is more stable and accurate. The features for matching may be potential important features that can describe the 3D structure of a scene, for example, edges and vertices. The algorithm based on feature matching can obtain a depth image of sparse scenes first, and then use the method such as interpolation to obtain a depth image of dense scenes.
Another method is to obtain the depth image of scenes through a single depth camera.
The basic principles of a depth camera are to determine the distance of an object by transmitting infrared and detecting the strength of infrared reflected by the object in the scene. Therefore, the depth image output by a depth camera is of high quality, high precision, and good application prospect. Currently, a depth camera is primarily used for gesture recognition, background replacement and synthesis, and is seldom applied in the 3D video system. Generally, only a single depth camera is used to collect the video images of scenes.
When a single depth camera is used to collect the video images of scenes, the depth image of scenes is precise, but a single depth camera can obtain only one color image of a scene of a viewpoint and the corresponding depth image. A good reconstruction effect may be achieved at the time of reconstructing images of virtual viewpoints of small parallax. At the time of reconstructing images of virtual viewpoints of large parallax, however, few color images are obtained, and the color image information is deficient, so large “cavities” exist in the reconstructed images of virtual viewpoints and are not repairable. Therefore, the reconstructed images are seriously distorted and the reconstruction effect is poor.
FIG. 1 shows how cavities are generated at the time of reconstructing images of virtual viewpoints according to video images collected by a single depth camera in the conventional art. It is assumed that video images of object 1a and object 1b are obtained at viewpoint o1. Because object 1b shadows part 1a0 of object 1a, the actually obtained video image information includes only partial image information of object 1a and image information of object 1b, and does not include the image information of part 1a0 of object 1a. In the attempt of obtaining the video images of object 4a and object 1b at viewpoint o2, because the actually obtained video image information lacks image information of part 1a0 of object 1a, the image reconstructed at viewpoint o2 lacks the image of part 1a0 of object 1a, and a cavity is generated at part 1a0. Therefore, the reconstructed images are seriously distorted and the reconstruction effect is poor.
In the process of implementing the present invention, the inventor finds at least the following defects in the conventional art: The 3D matching algorithm depends on the luminance and chrominance information of the scene, and is vulnerable to impacts caused by uneven illumination, the noise of a camera, and repeated textures of scenes. Therefore, the obtained parallax/depth image includes many errors, the effect of reconstructing virtual viewpoints based on the depth image is inferior, and the reconstructed images are inaccurate. Moreover, the 3D matching algorithm is complex, and the real-time effect of obtaining the parallax/depth image is deteriorated, which baffles the commercial application of the technology. At the time of reconstructing images of virtual viewpoints of large parallax through a single depth camera, large “cavities” are generated and are not repairable, and therefore, the reconstructed images are seriously distorted, the reconstruction effect is poor, and the practicality is deteriorated.