Number of applications, such as three-dimensional (3D) and free-viewpoint TV, augmented reality, 3D visualization, 3D shape and colour scanning, simultaneous localization and mapping (SLAM), image segmentation and many others use view-plus-depth image format, which is also known as RGB+D format, as a main input source. In this format, each pixel of a digital colour image is augmented with the corresponding depth value, specifying distance between a corresponding point in the scene and the optical centre of the camera. Performance of these applications may directly depend on the quality of the supplied depth maps.
Many consumer-level 3D cameras may have two camera sensors, one of which is dedicated for the depth acquisition and another is for capturing a high-quality colour image. In order to construct a combined view-plus-depth frame, depth map may need to be aligned to the main (reference) colour view, and then up-sampled and denoised, if necessary. As the depth map itself can play a role of a mapping function, its alignment with the colour frame may be done through 3D image warping. Taking into account that colour and depth cameras are calibrated (their poses are known) such warping can be easily done in the per-pixel fashion. Some depth sensors, for instance such as time of flight (ToF)-based sensors, may have significantly lower original resolution than a reference colour camera, which may make depth up-sampling step obligatory.
Except lower resolution, depth sensors may also be prone to noise and other types of errors, including systematic errors. For time-of-flight cameras, those may especially be visible on the surfaces with low or non-Lambertian reflectance or on some materials, like hair. Another source of noise in the time-of-flight cameras is connected with the power of illumination. For instance, when using such camera in a mobile environment (e.g. as a handheld device), due to restricted power consumption infra-red emitters may not be able to illuminate with full power and hence, sensed reflectance may be worsen. Other active depth sensors, such as based on the triangulation, may have depth errors connected with too sparse correlation pattern, which may result in wrong shapes of object boundaries and over-smoothing of small details.
One problem appearing for many two-sensor 3D cameras is that it may not be possible to filter depth map using available colour image until it is aligned with a colour view and it is not possible to align a depth map before it was filtered.
Another problem, connected with the direct depth map projection is that a non-regular-to-regular grid resampling may be needed. Since projected depth values have non-regular positions on the colour camera sensor, their resampling to a regular pixel grid may be needed.