Through the development of frameworks in image processing, stereo vision has been extensively applied in many fields such as structured light, stereo image, distance detection, surveillance, and so forth. Stereo vision generally includes two stages. In the first stage, depth information would be generated by using a depth camera, a stereo camera, or a related algorithm. In the second stage, a stereo image would be generated by using the depth information. Hence, accurate depth information is rigidly important to provide a pleasant stereo viewing experience.
The fundamental of stereo vision is to simulate binocular disparity by left and right lenses spaced apart by an average distance between two eyes of a human, to generate stereo depth information of a captured scene according to offsets of each pair of corresponding pixels in images captured by the two lenses, and to thereby form a depth map of the scene. An actual distance D between each object and the lenses could be calculated through the use of the depth map based on Eq.(1):
                    D        =                              B            ×            F                    d                                    Eq        .                  (          1          )                    where d denotes a depth value in the depth map, F denotes a focal length of the lenses, and B denotes a distance between optical axes of the lenses. However, a viewable range of the lenses is associated with the length of a baseline, and baselines with different lengths could result in different accuracy levels in estimating depth information at different distances. Hence, a region allowed for depth estimation and an accuracy of depth information would be inherently restricted due to only one baseline existing between the two lenses.