In many known algorithms for estimating the disparity or depth (herein collectively referred to as a depth information value) for a given fragment of a digital image, for instance a pixel or a group of pixels of the digital image, the depth information value is selected as the best one from a set of depth information values under consideration. Often the selection is done by minimizing a cost function Ccurrent(d) with respect to the depth information value d for a currently processed fragment.
This cost function can be a purely local fragment matching error or matching cost Mcurrent(d) like in the well-known “Winner-Takes-All (WTA) algorithm” described, for instance, in D. Scharstein & R. Szeliski “A taxonomy and evaluation of dense two-frame stereo correspondence algorithms”, International Journal of Computer Vision 47, 7-42, 2002. In such algorithms the depth information value for each fragment is selected independently from depth information values of other fragments as
                              d          best                =                                            arg              ⁢                                                          ⁢              min                        d                    ⁡                      [                                          M                current                            ⁡                              (                d                )                                      ]                                              (        1        )            where
            arg      ⁢      min        d    ⁡      [    ·    ]  denotes the selection of the depth information value d for which the expression within the square brackets is minimal.
The matching error or matching cost Mcurrent(d) for the position (x,y) of the fragment in the image and the depth information value d associated with the fragment are usually computed using an error function, which determines the difference between the value of the image I in position (x,y) and the value of the reference image Iref (or images) in position (x+d,y). Usually, the term value of the image refers to color channels or a luminance value of the texture image, but may be also combined with a horizontal and vertical gradient. The commonly used error functions are the sum of absolute differences (SAD) given by the following equation (2) or the sum of squared differences (SSD) given by the following equation (3) (see, for instance, H. Hirschmueller and D. Scharstein, “Evaluation of Cost Functions for Stereo Matching”, IEEE Conference on Computer Vision and Pattern Recognition, 2007):Mcurrent(d)=SAD(I(x,y),Iref(x+d,y))=|I(x,y)−Iref(x+d,y)|  (2)Mcurrent(d)=SSD(I(x,y),Iref(+d,y))=(I(x,y)−Iref(x+d,y))2  (3)
In more advanced algorithms (like in Viterbi, Forward or Belief Propagation algorithms, also described in the above-referenced article by D. Scharstein & R. Szeliski) a more sophisticated cost function Ccurrent(d) is used for minimization and selection of the resulting depth information value. In such a case, Ccurrent(d) typically is a sum of Mcurrent(d) with a min-convolution of a transition cost function T with costs related to all considered depth information values in neighboring fragments, i.e. additionally includes a smoothing term. In Forward and Viterbi algorithms, the neighboring fragments are those that have already been processed to obtain their depth information value d, and, therefore, Ccurrent(d) for a given depth information value d accumulates the cost from all previously processed fragments (designated by the index “prev”) considered for the depth information value estimation of the currently processed fragment (designated by the index “current”):
                                                        C              current                        ⁡                          (              d              )                                =                                                                      C                  prev                                ⁡                                  (                  d                  )                                            ⁢                              *                min                            ⁢                              T                ⁡                                  (                                      q                    ,                    d                                    )                                                      +                                          M                current                            ⁡                              (                d                )                                                    ,                            (        4        )            where Mcurrent(d) is the local fragment matching error for the depth information value d as described before, Cprev(d) is the cost for a previously processed fragment for the depth information value d, T (q, d) is a two-argument transition-cost function (cost for changing from depth information value q to depth information value d) and the operator
                    *                            min               denotes the min-convolution, defined as:
                                                        C              prev                        ⁡                          (              d              )                                ⁢                      *            min                    ⁢                      T            ⁡                          (                              q                ,                d                            )                                      =                              min            q                    ⁢                      (                                                            C                  prev                                ⁡                                  (                  q                  )                                            +                              T                ⁡                                  (                                      q                    ,                    d                                    )                                                      )                                              (        5        )            wherein minq denotes the smallest value with respect to q and both q and d belong to the considered range of depth information values (which is typically set a priori according to the parameters of the visual scene under consideration, i.e. object distance to the cameras). An exemplary transition cost function known from the literature is the Potts model:
                                          T            Potts                    ⁡                      (                          q              ,              d                        )                          =                  {                                                    0                                                                                  if                    ⁢                                                                                  ⁢                    d                                    =                  q                                                                                    penalty                                                                                  if                    ⁢                                                                                  ⁢                    d                                    ≠                  q                                                                                        (        6        )            
The cost for the current fragment Ccurrent(d) is calculated for all depth information values d considered for the depth information value estimation of the currently processed fragment.
In Belief Propagation algorithms, the final selection of depth information values for fragments can be done after multiple iterations of the algorithm.
In Forward algorithms the decision on the selection of the depth information value is done on-the-fly on the basis of the accumulated cost
      d    best    =                              arg          ⁢                                          ⁢          min                d            ⁡              [                              C            current                    ⁡                      (            d            )                          ]              .  
In Viterbi algorithms, the final selection of depth information values is postponed to an additional pass of back-tracking, executed when all cost values are known.
The currently known algorithms, that provide high fidelity of the estimated depth information values, in particular disparity or depth values, are computationally complex and not suitable for real-time processing, for instance, on mobile devices. On the other hand, currently known simple depth information value estimation algorithms, that can estimate depth information values in real-time, for instance, on mobile devices, provide limited fidelity of the results obtained.
Thus, there is a need for an improved image processing apparatus and method, in particular an image processing apparatus and method providing high fidelity of the estimated depth information values in a computationally efficient manner.