The invention relates to processing of image data and, more particularly to a method for the processing of a plurality of images associated with a scene imaging system that is capable of producing depth information of the scene. Still more specifically, the invention pertains to a method of distinguishing foreground and background of the scene in a depth space for extracting the foreground that is to be inserted into other images.
Foreground extraction or background suppression has been a topic in the composite photography and cinematography industry for many years. For instance, in U.S. Pat. No. 3,778,542 (issued Dec. 11, 1973 to L. C. Hanseman and entitled xe2x80x9cBlue screen travelling matte systemxe2x80x9d), a blue screen travelling matte system is used to create special photographic effects. In this design, a particular selectable saturated color appearing in the simultaneous red, blue and green video output signals of an electronic color camera is sensed and removed from the video signals by electronic subtraction of the selected color. The output red, blue and green video signals derived from a second electronic color camera are substituted for the removed saturated color. The final composite red, blue and green video output signals therefore contain picture elements from both cameras, combined in a manner such that the specific saturated color from the first camera is completely eliminated and replaced by picture elements derived from the second camera. One of the limitations of this system is that a uniformly colored (blue, green, or any constant backing color) background is required in order to extract the region of interest (human figure, for instance, in a newscast). This requirement in turn demands a constrained, structured environment that would limit the usage of this technology to very controlled situations. Many variations of blue screen method have been developed over the years (see, e.g., U.S. Pat. Nos. 5,812,214; 5,251,016; 4,629,298) but they all have the same limitations mentioned hereinabove.
It is understood that the purpose of imposing the aforementioned background restrictions is to compensate for the lack of enough information in the background suppression process. If these restrictions are to be removed, other information must be included so that the problem is still solvable. For example, in Tsai et al. (xe2x80x9cSegmenting focused objects in complex visual images,xe2x80x9d Pattern Recognition Letters, 19, pp. 929-940, 1998) the measurement of defocus of object edges in an image is used to separate complex foreground and background objects. Thus Tsai et al introduces the notion of spatial separation by looking at the degree of defocusing of the image objects. It is known that a two-dimensional planar intensity image is a perspective projection of a three-dimensional scene. It appears that the degrees of freedom of dimension are reduced from 3 to 2, but the spatial information in the third dimension seems lost in the course of projection. This lost spatial information is physically the distance (depth) from the scene to the sensing device, that is, the camera. In fact, the depth information is embedded in the original image pixel locations in that their locations are tightly related to the depth of the corresponding 3D scene. This spatial information, which is lost in the 2D projection, can be recovered by searching corresponding points (pixel locations) in a plurality of displaced intensity images of the scene.
FIG. 1 illustrates an exemplary background suppression system equipped with a pair of cameras 11a and 11b that capture two color images: top image 13b and bottom image 13a. Notice that the contents in the two images, e.g., the respective person images 14a and 14b and computer images 15a and 15b, have a vertical dislocation if the edges of the image frames are aligned. This dislocation is called global disparity, and it is a function of the average distance of the scene from the camera. The system needs to find individual disparity corresponding to each visible surface point in the scene so that a depth image can be produced. The value of each pixel in the depth image will represent the distance from the corresponding scene point being projected to that pixel location. In foreground extraction situations, the depth image is usually displayed as a gray scale image 10 as shown in FIG. 1 although it could also be displayed as a color image if the gray scale is color-coded. The depth image 10 in FIG. 1 reveals that a person 17 is in the foreground with a higher gray scale and a computer 16 is in the background with a lower gray scale. Intuitively, the foreground can be separated from the background based on such depth values. The separation of foreground and background can lead to the formation of a foreground mask image 18 showing a depth mask 19. The mask 19 is then used to select the corresponding person region 21 of the bottom intensity image 13a, and thereby produce a foreground image. The same mask is also used in compositing images as shown in FIG. 2 where the person 21 is added to the scene of a door 31. In this case, the foreground mask image 18, with the person depth mask 33, is used to suppress a portion of the background in the door image 31, thereby generating an intermediate image 34 in which a portion 35 of the door 36 is blocked out so that a person region 21 may be substituted in its place in the resultant composite image 41. Notice that the suppressed background is not a constant backing color scene.
From another perspective, separating foreground and background is essentially an image segmentation task that is formidable without a model, especially where there is a complex background. For example, FIG. 3 presents a scene with a person 63 in the foreground and a face picture 62 in the background. Usually, a face model would be used to single out the person in the image. If so, the face picture in the background will still be classified as part of the foreground and be selected as well. However, with the help of depth information, background suppression for this kind of scene would be possible. Accordingly, using the depth information of a scene is an effective way to extract foreground or suppress background of the scene image. The key issues, however, are the acquisition and processing of the depth information. Conventional depth recovery algorithms (see S B Marianne and M. M. Trigged, xe2x80x9cRegion-based stereo analysis for robotic applications,xe2x80x9d IEEE Trans. Systems, Man, and Cybernetics, 19(6): 1447-1464, 1989, and S. B. Marapane and M. M. Trivedi, xe2x80x9cEdge segment based stereo analysis,xe2x80x9d SPIE Vol. 1293, Applications of Artificial Intelligence VIII, pp. 140-151, 1990) do not provide clear depth boundaries (depth discontinuities) that are needed in forming a clear foreground depth mask.
What is therefore needed is a way to provide clear depth boundaries so that an accurate depth image, or map, can be formed. One use is to provide an image composite system wherein a foreground depth mask is formed by the means of analyzing the depth map of a scene. While this depth image, or map, would be used in the preferred embodiment in connection with an image composite system, it should be clearly recognized that such a depth image would be useful in a variety of situations, such as in the formation of virtual images. Consequently, the basic object is to provide a scene depth imaging system in which a scene depth map produced from a plurality of images provides more accurate depth data.
It is an object of the present invention to provide a scene depth imaging system in which a scene depth map produced by a plurality of images provides more accurate depth data.
It is a further object of the present invention to provide an image composite system wherein a foreground depth mask is formed by the means of analyzing the depth map of a scene.
The present invention is directed to overcoming one or more of the problems set forth above. Briefly summarized, according to one aspect of the present invention, a method and a computer program product for forming a depth image of a scene comprises the steps of: (a) generating intensity parameters corresponding to image features in each of two intensity images of a scene, the intensity parameters in one image pairing with intensity parameters in the other image to form pairs of intensity parameters indicative of potential correspondence between features in the two intensity images; (b) eliminating one or more pairs of intensity parameters based on one or more constraints related to the feasibility of a valid match between the pairs of intensity parameters; (c) calculating a match score for each of the remaining pairs of intensity parameters; (d) processing the match scores of the remaining pairs of intensity parameters through a processing algorithm in order to find matched pairs of intensity parameters indicative of correspondence between the same features in the two intensity images; and (e) generating a depth image from the matched pairs of intensity parameters.
In accordance with another aspect of the invention, a feature-point (edge point) guided matching method is used to find corresponding pixels in at least two different images presenting a scene so that an initial feature-point depth map of the scene can be computed. To reduce the mismatch rate, a consistency testing procedure is employed after each of the images has produced an initial feature-point depth map of the scene. A less noisy, but sparse, feature-point depth map is generated after the consistency testing procedure.
In another aspect, the present invention provides a color property assisted depth propagation method to establish a complete feature-point depth map after the sparse feature-point depth map is obtained. This includes setting up a size adjustable window at each feature point that does not have a depth value; searching within the window for qualified feature points that have depth values and pass a color property checking; and computing depth for the feature point that does not have a depth value using the depth values of the qualified feature points.
According to another aspect of the invention, there is provided a method of separating the foreground and the background of the scene and suppressing the background based on the depth map of the scene. The method includes sorting the depth values in the depth map in a descending or ascending order; eliminating depth values at some feature points based on an order statistics obtained from the ordered depth values; computing a histogram of the number of the feature points that still have depth values in a column-wise fashion, therefore to further deprive the depth values at feature points that do not belong to the majority of the remaining feature points that have depth values.
According to another aspect of the invention, a foreground depth mask is generated using the depth map containing the foreground feature points. For every pixel that does not have a depth value within the foreground region that is determined by the foreground feature points, a length extendable eight-nearest-neighbor search is conducted to collect a sufficient amount of feature points that have depth values and also satisfy a color criterion. An LMS (least median squared) estimation is then performed using the collected feature points to compute a depth value for that pixel.
These and other aspects, objects, features and advantages of the present invention will be more clearly understood and appreciated from a review of the following detailed description of the preferred embodiments and appended claims, and by reference to the accompanying drawings.
The current invention presents a method of forming a foreground depth mask, which in turn provides an alternative approach to the foreground extraction problem in the blue screen technology so that the need of a specially arranged environment with a constant backing color can be eliminated. This invention utilizes a plurality of images associated with a scene imaging system to produce a depth map of the scene for which the foreground is to be separated from the background. This invention enables the use of an arbitrary background rather than a uniformly colored one to extract the foreground object of interest. Moreover, the image composite operation after the extraction can be conducted in either 2D space or 3D space.