In a conventional single-viewpoint television system, a view and view point of a user are decided by a three-dimensional space position and direction of a camera. Therefore, the user cannot freely select the view and viewpoint for watching. A free view television (FTV) system proposed in 2002 allows the user to freely select a view and viewpoint for watching a television, thereby providing a more vivid and real brand-new three-dimensional audio-visual system.
A key technology of the FTV is about how to acquire accurate depth information; for example, stereo matching, that is, parallax estimation is performed according to two views shot by two adjacent cameras.
In the prior art, a global optimization algorithm is applied to the parallax estimation, in which, the global optimization algorithm mainly constructs an energy model satisfying a certain constraint condition as shown in Formula (1), so that the global mismatch energy is minimal. The key of the accuracy of the parallax estimation lies in whether an energy function can accurately express an actual matching extent between corresponding pixels under different assumed parallaxes, and the smoothness between parallaxes of adjacent pixels at the same depth, that is, Edata and Esmooth in Expression (1).E=Edata+λ×EsmoothEdata=|I(x−d,y)−Iref(x,y)|Esmooth=|disp(x,y)−disp(xneighbor,yneighbor)|  (1)
In Formula (1), Edata represents a matching error in a case that a parallax of a current pixel is d, I(x, y) represents a luminance value of the current pixel, and Iref(x−d, y) represents a luminance value of a matching pixel in a reference view when the parallax is d. The same as a matching policy of local matching, Esmooth represents an absolute difference between parallaxes of two adjacent pixels, which denotes the smoothness of the parallaxes of the adjacent pixels.
Meanwhile, in Prior Art 1, in order to increase the accuracy of the parallax estimation, Edata in Formula (1) for sub-pixel search policy computation is adopted in an FTV standardization formulation procedure.
In the implementation of the present invention, the inventors of the present invention find that, in the prior art, interpolation is performed in a horizontal direction, so that an interpolation result is not accurate enough, and further a parallax estimation result is either not accurate.
In Prior Art 2, in the 85th MPEG conference held in Hanover in July, 2008, the GIST proposed, when an error function Edata is matched, a time consistency constraint is added, that is, a time consistency constraint item is added in the expression of Edata in Formula (1), as shown in Formula (2).Edata=|I(x,y)−Iref(x,y−d)|+Ctemp(x,y,d)Ctemp(x,y,d)=λ×|d−Dprev(x,y)|  (2)
In different matching search methods, |I(x,y)−Iref(x,y−d)| in Expression (2) may be replaced with other functions, which does not directly embody the time consistency constraint. Dprev(x,y) represents a parallax of a pixel in a previous frame having the same coordinate as the coordinate of a pixel (x,y) of a current frame, λ represents a weighting factor, Ctemp(x,y,d) is a time consistency constraint item, and d is an estimated parallax of a current pixel.
In the implementation of the present invention, the inventors of the present invention find that, in Prior Art 2, motion features are not considered for the time consistency constraint, and a uniform time consistency constraint is adopted for a motion area, so that an error occurs in the estimated parallax, that is, the accuracy is very low.