A number of applications require fast and precise depth map evaluations. These applications include, for example, gesture recognition systems, face recognition systems, virtual keyboards, object and person monitoring, and virtual reality games.
Time-of-flight (ToF) depth sensors have become more widespread over the past years. At the high end side, some devices like the Kinect2 device from Microsoft provide a high definition depth map up to 1080p (HD). At the low end side, other ToF devices provide a low resolution depth map with just one or a few ranging points. In some cases, information related to the reflected intensity, referred to as “signal count,” is also output at a higher resolution.
Even though high resolution depth devices are available, a disadvantage is cost. In sharp contrast, the low resolution devices are typically one or several order of magnitude less expensive. However, a low resolution depth map needs to be increased, such as through upsampling.
One approach to increase the number of simultaneous ranging points of a low resolution depth map is to use spatial upsampling. These methods include bilinear, weighted average, median and bicubic, for example. However, theses methods have several shortcomings.
One shortcoming is that the upsampling does not add real information. Straightforward algorithms typically result in blurry images or edge artifacts. More complex algorithms require not only more operations but also larger kernels, which may not be suited to a very low resolution depth map. More generally, border pixels are problematic or neglected in these image processing methods while their proportion may be large with a very low input resolution.
Another approach to increase the number of simultaneous ranging points of a low resolution depth map is to use super resolution from multiple acquisitions. Super resolution sums up the different information acquired at different instants. Similarly, simultaneous acquisitions from different viewpoints can be used as inputs. However, super resolution is not generic since it relies on motion between acquisitions and needs aliased inputs, or relies on the availability of several synchronized devices in the case of simultaneous acquisitions, which is then more costly. An IEEE Transactions on Image Processing article titled “Fast and Robust Multiframe Super Resolution” is computationally heavy and iterative. Even more difficult is the case for a 3×3 depth map due to the very limited input resolution, where a global motion vector would need to be estimated from two successive 3×3 inputs.
Yet another approach to increase the number of simultaneous ranging points of a low resolution depth map is to use joint bilateral upsampling. Joint bilateral upsampling makes use of two different signals available at two different resolutions. In the case of a ToF device, these would be the signal count map in parallel with the depth map, with the signal count map having a higher resolution than the depth map. Joint bilateral upsampling works better than traditional spatial upsampling by following edges from the signal count map, and is more generic and less complex than super resolution. However, joint bilateral upsampling is blind on the nature of its input data and thus does not take advantage of known properties from the device, and is a heuristic approach that needs tuning. In addition, joint bilateral upsampling remains sensitive to the fact that most input values for a 3×3 depth map are border pixels.
Even in view of the above described approaches, there is still a need to improve upsampling of a low resolution depth map.