Depth-from-defocus (DFD) methods measure depth by comparing an amount of blur in corresponding tiles of two or more images taken with different camera settings such as different focus or different aperture settings. The size of the tiles affect the depth estimates. The larger the size of the tiles the less noisy are the depth estimates over regions of similar depth. On the other hand, the spatial resolution at depth boundaries is reduced.
Along depth boundaries, the DFD assumption of constant depth (over a tile) is also violated and the depth estimates are inaccurate. DFD methods also generate very noisy or no depth estimates in low texture regions. As a result, DFD depth maps are often refined to:                reduce noise in depth estimates, and        align depth boundaries with object edges.        
Superpixel segmentation segments an image into superpixels which are collections of connected pixels with similar characteristics such as colour and texture. The boundaries of the superpixels often (but not always) align well with object edges. Their use in depth map refinement, in particular depth boundary refinement, can be advantageous.
Superpixel segmentation has been applied to both low resolution depth maps and their corresponding high resolution images. Heuristics are then used to transfer the boundaries of the superpixels of the high resolution image to the corresponding superpixels of the depth map to improve the resolution of the depth boundaries in the depth map. This approach assumes that the superpixel segmentation is correct. However, superpixel boundaries do not always align with object edges. At depth boundaries where the colour and/or texture are similar across the boundary, a superpixel may include segments of different objects on either side of the boundary. Another problem with superpixel segmentation is that fine structures such as hair strands, flower stems, etc, in an image that are of scale finer than the superpixels, are combined with other objects to form a superpixel and cannot be separated.
Imaging matting has been used to refine the depth boundaries obtained from superpixels. Image matting attempts to separate foreground objects of an image from the background of the image. Imaging matting typically involves estimation of an alpha matte which specifies the full or partial pixel coverage of the background by foreground objects. An image, I, can be represented as a combination of a foreground image, F, and a background image, B, such that the colour value of a pixel p, Ip, is given byIp=αpFp+(1−αp)Bp  (1)where αp is the alpha value of p (in the alpha matte), and Fp, Bp are the colour values of the foreground image, F, and the background image, B, at pixel p, respectively.
To perform alpha matting, a trimap is typically first defined. The trimap divides the image into three non-overlapping regions, namely a foreground region in which the image is believed, with high probability, to be foreground, a background region where the image is believed, with high probability, to be background, and an unknown region where it is uncertain how much the foreground and the background images contribute to the pixels' colour. The alpha value of all pixels in the foreground region are set to 1 and the alpha value of all pixels in the background region are set to 0. Only the alpha value of the pixels in the unknown regions, these alpha values falling in the range of 0 to 1, are estimated by comparing the colour of the pixels with sample pixels from the nearby foreground and background regions. The estimated alpha value for each pixel measures the relative contribution of the foreground image and the background image to the pixel. In depth refinement, the estimated alpha value can be used as a measure of the probability of a pixel being part of the foreground. By thresholding the estimated alpha values, pixels in the unknown region can be classified as foreground or background. For instance, pixels with estimated alpha above 0.5 can be classified as foreground while pixels with estimated alpha equal to or less than 0.5 can be classified as background.
For superpixel based depth refinement, the trimap required for image matting can be obtained by applying heuristics to select different subsets of the superpixels obtained from a high resolution image for the foreground, background and unknown region of the trimap.
For instance, depth obtained from a low-resolution depth map such as 110 in FIG. 1B can be averaged over each superpixel such as 201 in a superpixel representation 200 of the high resolution image 100. Two depth thresholds, namely a lower and an upper depth threshold, can be pre-defined or dynamically determined to divide the superpixel segmentation into the three regions of the trimap. Superpixels whose average depth is lower (ie less than) than the lower threshold will be defined as foreground, superpixels whose average depth is higher (ie greater than) than the upper threshold will be defined as the background, while the remaining superpixels are defined as unknown.
Selecting different subsets of the superpixels for the three regions of a trimap of the high resolution image presents a number of difficult problems. First of all, if the foreground superpixels are not selected correctly, that is, if they are not 100% foreground, then superpixel segmentation errors cannot be corrected by the subsequent image matting. This is especially the case for small background regions surrounded by a foreground object or vice versa. Those regions can become hidden away and locked in the interior of a superpixel and the regions would thus be given the same foreground or background assignment of the surrounding pixels.
If the scale (ie the average size) of the superpixels is set to a small value, more superpixels will be wrongly assigned to the foreground and background at depth boundaries, due to the low resolution depth map. This can severely affect the accuracy of the alpha matte. If the scale of the superpixels is set to a large value, the unknown region will be unnecessary large, decreasing the image matting accuracy since the foreground and background samples for computing the alpha values have to be picked from adjacent foreground and background superpixels that are further away from an unknown pixel in the superpixel in question. The larger unknown region also increases the amount of computation required.