Conventional methods and systems of generating composite images allow a user to extract people or objects from a scene, and composite them in front of a different, “fun” background, such as a Las Vegas skyline or an image of the moon. In the past, four methods have been used to accomplish this result: (1) A special uniformly colored screen or bright background is used behind the people/objects of interest, and a foreground mask is created using a “linear blue screen” method or “chroma key” method. An example of this method is described in U.S. Pat. No. 5,424,781, which is entitled “Backing Color and Luminance Nonuniformity Compensation for Linear Image Compositing”. This method can give excellent results, but requires that the user have an expensive, carefully lit, colored background. (2) The people/objects of interest may be captured in front of any type of background, and then “cut” out of the background electronically using software tools available in such software packages as Adobe Photoshop™ version 6.0, for example. Unfortunately, for most subjects such as people, use of such software tools is a time-consuming and difficult process that typically yields a less than realistic looking border around the image. (3) The people/objects of interest may be captured in front of any type of background, and then the people/objects of interest are removed and the background itself is captured in a second image. An example of this method is found in commonly-assigned U.S. Pat. No. 5,914,748, which is entitled “Method and Apparatus for Generating a Composite Image Using the Difference of Two Images”, where the two images are subtracted to form a difference image, and the difference image is digitally processed to form a mask image. The mask image is used to extract the people/objects of interest. Unfortunately, even though this method is superior to both of the other methods, it requires that the camera be firmly mounted on a tripod, or otherwise stabilized by a stationary object. (4) The people/objects of interests are separable from the background according to the difference in depth as captured by a companion depth map. As disclosed in commonly assigned U.S. patent application Ser. No. 09/382,451, which is entitled “Method for Forming a Depth Image from Digital Image Data,” and was filed Aug. 25, 1999, a plurality of images associated with the same scene are captured to produce a depth map of the scene, making it possible to distinguish foreground subjects from the background for extracting the foreground subject that is to be inserted into other images.
In another method of extracting objects of interest from an image found in U.S. Pat. No. 6,377,269, which is entitled “Automated Generation of Masks for Photo-Compositing”, an algorithm relies on two images which represent the same foreground against different backgrounds. This method does not require that the camera be firmly mounted on a tripod, or otherwise stabilized by a stationary object. However, if there is mis-registration between the two supplied images, edge detail may be lost. Thus this method includes a step of registering the images in order to remove errors caused by slight camera movement between image captures or from misalignment during scanning for images photographed with film. However, the disclosed alignment tool, which provides semi-automated and manual sub-pixel alignment of the images in both translation and rotation, would require knowledge and effort from the user in order to align the images. Furthermore, their method relies on single-color backgrounds, which prohibits it from being practiced with ordinary consumer images that contain backgrounds with arbitrary and unknown content.
An automatic technique such as phase correlation (see C. Kuglin and D. Hines, “The Phase Correlation Image Alignment Method”, Proc. 1975 International Conference on Cybernetics and Society, pp. 163–165, 1975) could be used to automatically register images prior to subject extraction. Phase correlation only relies on the phase content of images to be registered, and not on the color or intensity content; therefore, phase correlation would be amenable to registering two images containing common subjects, but differently colored solid backgrounds, since the phase content of the backgrounds does not differ. However, this technique would not be amenable to extracting a subject from two images, one containing the subject plus background, and the second containing background only, when the images are ordinary consumer images whose backgrounds contain arbitrary and unknown content. In these circumstances, the phase content of the first and second images differ, making automatic registration a much more difficult problem.
U.S. Pat. No. 6,301,382, which is entitled “Extracting a Matte of a Foreground Object from Multiple Backgrounds by Triangulation”, describes a method for extracting a matte of a foreground object from a composite image, using a computer. An image of the foreground object is recorded over at least two backgrounds having arbitrarily different coloring. Even though a registration step is included in certain embodiments of the method (e.g., see FIGS. 5 and 6 in the '382 patent), this registration step is used only to remove alignment errors introduced by misalignment of film in the camera as subsequent images are captured, or by misalignment of film in a scanner. The patent states explicitly (column 16, lines 1–2) that a remote-controlled shutter is used to guard against slight camera movements. This indicates that the camera is somehow stabilized during image capture, requiring the need for a tripod or other stabilization mechanism. Thus this patent fails to suggest any method for matte extraction that contains a registration step capable of automatically aligning images captured in the midst of any camera movement. Moreover each point of one background must have a color that is different than the color of a corresponding point in the other backgrounds, which prohibits it from being practiced with ordinary consumer images that contain backgrounds with arbitrary and unknown, but nonetheless similar, content.
U.S. Pat. No. 5,262,856, which is entitled “Video Image Compositing Techniques”, describes a method for aligning images from a sequence of video frames, where the method includes aligning the backgrounds of images that contain moving subjects in order to produce an image sequence pertaining to a wider aspect ratio. This technique is capable of automatically aligning images with a common background and moving subjects without requiring a tripod or other stabilization means. However, the method is only reliable when the camera view does not change drastically over the course of the image sequence. The method relies on a motion estimation technique for alignment, which assumes that any misalignment between subsequent images can be described locally. This constrains subsequent images to be captured from roughly the same view point. Furthermore, the patent suggests a spherical or cylindrical projection of the wide aspect ratio image in order to combine elements from different frames with effectively little or no distortion. Spherical and cylindrical projection both implicitly assume that the viewpoint of the camera remains the same in subsequent images, and that any motion is due to a rotation of the camera about its nodal point. Therefore, this method is not amenable to extracting the subject of an image given multiple images that are captured from different viewpoints.
What is required is a method of extracting people/objects of interest from images containing arbitrary and unknown backgrounds that can quickly and easily be accomplished by inexperienced users, and that gives acceptable results, without requiring a special colored background, or the stabilization of the camera by a tripod or other stationary object, and that is robust under fairly large camera motion.