One embodiment of depth image formation is foreground extraction or background suppression that has been a topic in the composite photography and cinematography industry for many years. For instance, in U.S. Pat. No. 3,778,542 (issued Dec. 11, 1973 to L. C. Hanseman and entitled “Blue screen travelling matte system”), a blue screen travelling matte system is used to create special photographic effects. In this design, a particular selectable saturated color appearing in the simultaneous red, blue and green video output signals of an electronic color camera is sensed and removed from the video signals by electronic subtraction of the selected color. The output red, blue and green video signals derived from a second electronic color camera are substituted for the removed saturated color. The final composite red, blue and green video output signals therefore contain picture elements from both cameras, combined in a manner such that the specific saturated color from the first camera is completely eliminated and replaced by picture elements derived from the second camera. One of the limitations of this system is that a uniformly colored (blue, green, or any constant backing color) background is required in order to extract the region of interest (human figure, for instance, in a newscast). This requirement in turn demands a constrained, structured environment that would limit the usage of this technology to very controlled situations. Many variations of blue screen method have been developed over the years (see, e.g., U.S. Pat. Nos. 5,812,214; 5,251,016; and 4,629,298) but they all have the same limitations mentioned hereinabove.
It is understood that the purpose of imposing the aforementioned background restrictions is to compensate for the lack of enough information in the background suppression process. If these restrictions are to be removed, other information must be included so that the problem is still solvable. For example, in Tsai et al. (“Segmenting focused objects in complex visual images,” Pattern Recognition Letters, 19, pp. 929–940, 1998) the measurement of defocus of object edges in an image is used to separate complex foreground and background objects. Thus Tsai et al introduces the notion of spatial separation by looking at the degree of defocusing of the image objects. It is known that a two-dimensional planar intensity image is a perspective projection of a three-dimensional scene. It appears that the degrees of freedom of dimension are reduced from 3 to 2, but the spatial information in the third dimension seems lost in the course of projection. This lost spatial information is physically the distance (depth) from the scene to the sensing device, that is, the camera. In fact, the depth information is embedded in the original image pixel locations in that their locations are tightly related to the depth of the corresponding 3D scene. This spatial information, which is lost in the 2D projection, can be recovered by searching corresponding points (pixel locations) in a plurality of displaced intensity images of the scene.
FIG. 1 illustrates an exemplary background suppression system equipped with a pair of cameras 11a and 11b that capture two color images: top image 13b and bottom image 13a. Notice that the contents in the two images, e.g., the respective person images 14a and 14b and computer images 15a and 15b, have a vertical dislocation if the edges of the image frames are aligned. This dislocation is called global disparity, and it is a function of the average distance of the scene from the camera. The system needs to find individual disparity corresponding to each visible surface point in the scene so that a depth image can be produced. The value of each pixel in the depth image will represent the distance from the corresponding scene point being projected to that pixel location. In foreground extraction situations, the depth image is usually displayed as a gray scale image 10 as shown in FIG. 1 although it could also be displayed as a color image if the gray scale is color-coded. The depth image 10 in FIG. 1 reveals that a person 17 is in the foreground with a higher gray scale and a computer 16 is in the background with a lower gray scale. Intuitively, the foreground can be separated from the background based on such depth values. The separation of foreground and background can lead to the formation of a foreground mask image 18 showing a depth mask 19. The mask 19 is then used to select the corresponding person region 21 of the bottom intensity image 13a, and thereby produce a foreground image. The same mask is also used in compositing images as shown in FIG. 2 where the person 21 is added to the scene of a door 31. In this case, the foreground mask image 18, with the person depth mask 33, is used to suppress a portion of the background in the door image 31, thereby generating an intermediate image 34 in which a portion 35 of the door 36 is blocked out so that a person region 21 may be substituted in its place in the resultant composite image 41. Notice that the suppressed background is not a constant backing color scene.
From another perspective, separating foreground and background is essentially an image segmentation task that is formidable without a model, especially where there is a complex background. For example, FIG. 3 presents a scene with a person 63 in the foreground and a face picture 62 in the background. Usually, a face model would be used to single out the person in the image. If so, the face picture in the background will still be classified as part of the foreground and be selected as well. However, with the help of depth information, background suppression for this kind of scene would be possible. Accordingly, using the depth information of a scene is an effective way to extract foreground or suppress background of the scene image. The key issues, however, are the acquisition and processing of the depth information. Conventional depth recovery algorithms (see S B Marianne and M. M. Trigged, “Region-based stereo analysis for robotic applications, ”IEEE Trans. Systems, Man, and Cybernetics, 19(6): 1447–1464, 1989, and S. B. Marapane and M. M. Trivedi, “Edge segment based stereo analysis,” SPIE Vol. 1293, Applications of Artificial Intelligence VIII, pp. 140–151, 1990) do not provide clear depth boundaries (depth discontinuities) that are needed in forming a clear foreground depth mask.
Another embodiment of depth image formation is 3D (three-dimensional) reconstruction of medical images. Dobbins et al (“Digital x-ray tomosynthesis: current state of the art and clinical potential”, Phys. Med. Biol. 48 (2003) R65–R106) provides a review of x-ray tomosynthesis. Digital x-ray tomosynthesis is a technique for producing slice images using conventional x-ray systems. It is a refinement of conventional geometric tomography, which has been known since the 1930s.
In conventional geometric tomography, an x-ray tube and image receptor move in synchrony on opposite sides of the patient to produce a plane of structures in sharp focus at the plane containing the fulcrum of the motion; all other structures above and below the fulcrum plane are blurred and thus less visible in the resulting image. Tomosynthesis improves upon conventional geometric tomography in that it allows an arbitrary number of in-focus planes to be generated retrospectively from a sequence of projection radiographs that are acquired during a single motion of the x-ray tube. By shifting and adding these projection radiographs, specific planes may be reconstructed. While the methods reviewed in the Dobbins article can provide users with adequate 2D images, the methods do not produce meaningful 3D structures (depth information).
Accordingly, there exists a need for a method to provide clear depth boundaries so that a meaningful depth image, or map, can be formed.
One use is to provide an image composite system wherein a foreground depth mask (or a 3D structure) is formed by the means of analyzing the depth map of a scene. While this depth image, or map, can be used in the preferred embodiments of the present invention in connection with an image composite system, it should be recognized that such a depth image would be useful in a variety of situations, such as in the formation of virtual images. As such, an object is to provide a scene depth imaging system in which a scene depth map produced from a plurality of images provides meaningful depth data.