Depth maps can be produced by means of stereoscopic images, structure light, LIDAR, depth from focus (DFF), depth from defocus (DFD).
Depth information can be estimated from a pair of stereoscopic images of the same scene captured by two (typically identical) light sensors or cameras displaced by a certain distance known as the baseline from each other. A disparity map is extracted from the image pair (also known as stereo pair) and is constructed by matching corresponding points in the stereo pair. Disparity is the relative displacement of corresponding points in the stereo pair and can be converted to depth z using Equation (1), below:
                    z        =                              f            ⁢                                                  ⁢            b                    d                                    (        1        )            where f is the focal length, b is the baseline and d is the disparity.
When estimating depth from structured light, a light source (typically, an infrared (IR) light source) may be used to project a known pattern of light onto the scene. A sensor displaced by a certain distance from the light source may be used to capture the resulting image. By measuring the distortion of the known pattern, a depth map of the scene is estimated.
With LIDAR, a laser light is used to illuminate a scene while a laser detector picks up the reflection. Given the known speed of light, the depth of a scene can be estimated by measuring the time-of-flight of the laser signal in the case of a scanning LIDAR device, and by measuring the amount of laser light received in the case of a scanner-less LIDAR device (whose image sensor and laser source has a shutter that opens and closes at the same rate).
The depth map estimated by the above methods provides three dimensional (3D) information of a scene. Objects in a scene typically form connected pixels of slowly varying depth, and objects at different depths appear as separate connected components in the depth maps. The boundaries of these connected components, where significant change in the estimated depth occurs, may be referred to as depth boundaries.
Depth maps produced using the above methods have some problems in common, namely,                depth boundaries that do not align with edges of the objects in an associated image,        regions (including occluded regions) that have no depth estimates, and        the depth maps have lower resolution than the corresponding imagesThe misalignment between depth boundaries and object edges is more pronounced        for moving objects due to motion blur,        for objects that are closer to the camera, and        along one axis (e.g. in the horizontal compared to the vertical direction) due to the relative position of the two sensors—two colour light sensors, one colour light sensor and one laser sensor, or one colour light sensor and one IR sensor as well as the relative position of the IR/laser source and the associated sensor.        
FIG. 2A shows a sample depth map 200 obtained using a known method and some common problems of the depth estimates of the map 200. Depth map 200 shows the depth profile of an open-plan office that has a suspended ceiling around light fittings and ductwork. Depth map 200 also shows a person in the foreground.
A large part of depth map 200, such as the regions 210 and 220, have no depth estimates. A number of reasons may explain the lack of a depth estimate for a pixel in the depth map 200 including:                the associated object is out of (depth) range,        the pixel falls outside the view covered by the sensor,        the associated surface does not reflect the projected IR pattern, or        an occlusion region (e.g., occlusion region 220) blocks the light source from illuminating the object at that position in the capture image.        
In the case of a LIDAR system, some surfaces may not reflect the laser light and, in the case of a stereo depth system, some surfaces may have little texture to enable correspondence search of a stereo pair, resulting in no depth estimates for those surfaces.
Some objects do not reflect the IR light well, such as the dark monitor screen at 230, and the resulting depth estimates are very noisy.
Reference 240 in FIG. 2A indicates the depth boundary of the right arm of the person close to the camera. FIG. 2B shows an expanded view 260 of 240. As seen in FIG. 2B, the image of the arm is overlaid on top of the depth estimates. The depth values of the arm 262 can be seen to bleed from the actual edge 264 of the arm into the background which, at some places, is by as much as twenty (20) pixels.
Reference 250 of depth map 200 indicates a cable at a considerable distance from a camera used to capture the image associated with the map 200. FIG. 2C shows an expanded view 270 of 250. In 270, the image of the cable 274 is overlaid on top of the depth estimates of the cable 272. It can be seen that, while the depth estimates of the cable are not very accurate and the depth boundaries do not align well with the actual edges of the cable, the depth values of the cable 272 do not bleed much into the background.
Depth estimates near depth boundaries are unreliable and do not align well with the actual edges of the associated objects. To improve the alignment, some further methods drop the estimated depth for a band of pixels at the depth boundaries, and replace the depth values with the estimated depth of nearby, similarly coloured pixels that are outside the band. For these further methods, the width of the band is fixed for an entire depth map, and an appropriate width has to be determined for each depth map. As described above, the fixed width only works well over a sub-range of depth. For instance, a width that works well for objects close to the camera will not work well for more distant objects and vice versa.
Other methods refine the depth map by minimising a cost defined by a cost function with a smoothness regulariser to ensure smooth variation in depth across the scene unless there is large change in pixel colour. Without taking into account the reliability issues of depth estimates at depth boundaries, these other methods typically over-smooth depth boundaries, in particular, at depth boundaries where the change in pixel colour is small.
In many applications, such as 3D visualisation and free-viewpoint video, accurate alignment between depth boundaries and object edges are needed. Poor alignment between depth boundaries and object edges can result in highly disagreeable visual artefacts in the output images or videos.