Computer vision applications based on 3D imaging (three-dimensional imaging) have gained increasing attention from the scientific community in the recent years. This is due, at least in part, to a growing demand in industry for autonomous systems capable of sensing the shape and location of objects in a scene.
In recent years, 3D cameras based on the Time-of-Flight (ToF) principle have become commercially available. Compared to 2D cameras, they measure, for each pixel, a radial distance of a point in the scene to the camera, while 2D cameras provide only a gray- or color image of the scene (hereinafter referred to as “intensity image”). Other 3D imagers are based on triangulation. Passive triangulation (without active illumination of the scene) is known as stereopsis or stereovision. Active systems employ laser or structured light techniques. Triangulation methods are typically more computationally intensive since the depth value has to be calculated for each pixel. In case of stereoscopic imaging, this involves solving the so-called correspondence problem (identification of corresponding objects in the two half-images). When a structured light technique is used, typically several illumination patterns have to be processed. Nevertheless, 3D imagers, based on stereoscopy or structured light, providing high-resolution depth maps in real-time have meanwhile become available. Microsoft's Kinect™ is currently the most popular example of a 3D imager using structured light technology.
Whichever of the known depth sensing techniques is used, benefits can be gained by combining the depth images with intensity images from a 2D camera. Today's ToF cameras have much lower resolution than common 2D cameras and the depth measurement is affected by noise. Fusing the data of a 2D and a 3D ToF camera may be used to overcome the mentioned limitations of 3D cameras. Stereoscopic imagers typically provide depth images with acceptable resolution but they suffer from another drawback, since they cannot reliably assign a depth value to pixels, which are seen by only one of the cameras. Furthermore, determination of depth values is less reliable in areas with poor contrast. Both effects may result in the depth image containing areas without depth values or with depth values that must be considered unreliable. A similar problem exists for structured-light imagers. For them to be able to sense any depths, the illumination unit and the camera must be separated by a baseline. This will inevitably result in occlusion or shadowing effects. Areas, which, as seen from the camera, are shadowed from the illumination unit, cannot directly be attributed any depth values.
The present invention proposes a method, which addresses these problems. The accuracy and resolution of a depth image (also: “distance image”) is enhanced by fusing the depth image with an intensity image. For the purpose of the present document, it is assumed that pixel matching has already taken place, if necessary. In other words, for each pixel of the depth image there is one corresponding pixel in the intensity image and vice-versa, where pixels are considered to correspond if they relate to (image) the same part of the scene.
The method presented herein is based on a further development of the method presented in the article “A New Multi-lateral Filter for Real-Time Depth Enhancement”, by F. Garcia, D. Aouada, B. Mirbach, T. Solignac, B. Ottersten, Proceedings of the 8th IEEE International Conference on Advanced Video and Signal-Based Surveillance, 2011. The method proposed in that article relates to enhancement of depth images acquired with a ToF camera and refines the Joint Bilateral Upsampling (JBU) filter, which is defined by
                                          J            1                    ⁡                      (            p            )                          =                                            ∑                              q                ∈                                  N                  ⁡                                      (                    p                    )                                                                        ⁢                                                  ⁢                                                            f                  S                                ⁡                                  (                                      p                    ,                    q                                    )                                            ⁢                                                f                  I                                ⁡                                  (                                                            I                      ⁡                                              (                        p                        )                                                              ,                                          I                      ⁡                                              (                        q                        )                                                                              )                                            ⁢                              R                ⁡                                  (                  q                  )                                                                                        ∑                              q                ∈                                  N                  ⁡                                      (                    p                    )                                                                        ⁢                                                  ⁢                                                            f                  S                                ⁡                                  (                                      p                    ,                    q                                    )                                            ⁢                                                f                  I                                ⁡                                  (                                                            I                      ⁡                                              (                        p                        )                                                              ,                                          I                      ⁡                                              (                        q                        )                                                                              )                                                                                        (                  Eq          .                                          ⁢          1                )            where p and q designate pixels, R(.) designates the depth image (R(q) is the depth value of pixel q), N(p) designates a neighborhood of pixel p. Pixel p may be a position vector p=(i,j)T, with i and j indicating the row and column, respectively, corresponding to the pixel position. This non-iterative filter formulation is a weighted average of the local neighborhood samples, where the weights are computed based on spatial and radiometric distances between the center of the considered sample and the neighboring samples. Thus, its kernel is decomposed into a spatial weighting term fS(.) that applies to the pixel position p, and a range weighting term fI(.) that applies to the pixel value I(p). The weighting functions fS(.) and fI(.) are generally chosen to be Gaussian functions with standard deviations σS and σI, respectively.
According to the bilateral filter principle, the fundamental heuristic assumptions about the relationship between depth and intensity data may lead to erroneous copying of 2D texture into actually smooth geometries within the depth image. Furthermore, a second unwanted artifact known as edge blurring appears along depth edges that are not perfectly aligned with corresponding edges in the 2D image. In addition, the measured depth values of the input depth map R may be erroneous in edge pixels due to a mixture of light from the foreground and the background, or due to dynamic edge effects along edges of moving objects. These erroneous values of these pixels should not be considered in the filter.
To address these issues, Garcia et al. use a pixel-weighted average strategy (PWAS) with filters J2 and J3 defined by:
                                                        J              2                        ⁡                          (              p              )                                =                                                    ∑                                  q                  ∈                                      N                    ⁡                                          (                      p                      )                                                                                  ⁢                                                          ⁢                                                                    f                    S                                    ⁡                                      (                                          p                      ,                      q                                        )                                                  ⁢                                                      f                    I                                    ⁡                                      (                                                                  I                        ⁡                                                  (                          p                          )                                                                    ,                                              I                        ⁡                                                  (                          q                          )                                                                                      )                                                  ⁢                                  Q                  ⁡                                      (                    q                    )                                                  ⁢                                  R                  ⁡                                      (                    q                    )                                                                                                      ∑                                  q                  ∈                                      N                    ⁡                                          (                      p                      )                                                                                  ⁢                                                          ⁢                                                                    f                    S                                    ⁡                                      (                                          p                      ,                      q                                        )                                                  ⁢                                                      f                    I                                    ⁡                                      (                                                                  I                        ⁡                                                  (                          p                          )                                                                    ,                                              I                        ⁡                                                  (                          q                          )                                                                                      )                                                  ⁢                                  Q                  ⁡                                      (                    q                    )                                                                                      ⁢                                  ⁢        and                            (                  Eq          .                                          ⁢          2                )                                                                    J              3                        ⁡                          (              p              )                                =                                                    ∑                                  q                  ∈                                      N                    ⁡                                          (                      p                      )                                                                                  ⁢                                                          ⁢                                                                    f                    S                                    ⁡                                      (                                          p                      ,                      q                                        )                                                  ⁢                                                      f                    R                                    ⁡                                      (                                                                  R                        ⁡                                                  (                          p                          )                                                                    ,                                              R                        ⁡                                                  (                          q                          )                                                                                      )                                                  ⁢                                  Q                  ⁡                                      (                    q                    )                                                  ⁢                                  R                  ⁡                                      (                    q                    )                                                                                                      ∑                                  q                  ∈                                      N                    ⁡                                          (                      p                      )                                                                                  ⁢                                                          ⁢                                                                    f                    S                                    ⁡                                      (                                          p                      ,                      q                                        )                                                  ⁢                                                      f                    R                                    ⁡                                      (                                                                  R                        ⁡                                                  (                          p                          )                                                                    ,                                              R                        ⁡                                                  (                          q                          )                                                                                      )                                                  ⁢                                  Q                  ⁡                                      (                    q                    )                                                                                      ,                            (                  Eq          .                                          ⁢          3                )            where Q(.) is a credibility map defined by Q(q)=fQ(|∇R(q)|), with fQ(.) being a (preferably Gaussian) weighting function (with variance σQ). The enhanced depth image is proposed to be calculated as:J4(p)=(1−Q(p))·J2(p)+Q(p)·J3(p).  (Eq. 4)
The factor Q(.) takes into account that pixels located at edges of objects in the depth image are likely to cover part of the foreground and of the background at the same time and their depth value may thus be inaccurate or erroneous. Filter J2(.) is thus a cross-bilateral filter, in which the edge-blurring artifact is reduced by the factor Q(.). Filter J3(.) is a bilateral filter of the depth image, in which edge pixels are less weighted by factor Q(.). The enhanced depth image J4 is obtained by blending J2(.) with J3(.). Eq. 4 uses the credibility map as a blending map, whereby pixels of the depth image with high reliability are taken over in the enhanced depth image essentially unchanged, thus avoiding texture copying.