1. Technical Field
The invention is related to a system for de-noising multiple copies of a signal, and in particular, to a system for automatically combining two or more partially occluded copies of an image from approximately the same viewpoint to produce a composite image that is less occluded than any of the individual copies.
2. Related Art
A mosaic image is an image that is constructed from parts of two or more images of a scene. There are a number of well known conventional schemes for mosaicing images. Typically, these conventional schemes first align or register two or more image frames of a particular scene. One or more selected image portions, represented by groups of contiguous pixels from one or more of the aligned images, are then integrated into a composite or mosaic image. Thus, the resulting mosaic is constructed as a patchwork image containing portions from two or more of the aligned image frames.
Further, in order to provide for visually seamless integration of elements from different images into the mosaic image (so that the resulting mosaic doesn't actually look like a patchwork image), a number of conventional filtering techniques are used. For example, techniques including blending, feathering or some sort of linear or non-linear weighted pixel averaging along the edge of each portion added to the mosaic are commonly used for seamlessly adding such portions to the mosaic image. There are a number of similar techniques, well known to those skilled in the art, for seamlessly adding elements to the mosaic image.
One common use for creation of mosaic images is in creating an image from a set of two or more images that includes either more or fewer elements than any of the images alone. For example, where one image of a scene includes an object not in a second image of the same scene, it may be desired to construct a mosaic image of the scene based on the second image, wherein the mosaic image includes the second image of the scene as well as the object from the first image of the scene. Conversely, it may be desired to construct a mosaic image of the scene based on the first image, wherein the mosaic image includes the first image of the scene, but does not include the object that was included in the first image of the scene. Such uses for mosaic images are well known to those skilled in the art.
Clearly, the preceding example extends to the case where an object that is occluding one part of a first image is removed by incorporating non-occluded parts of another image into the first image. While this idea is conceptually simple, implementation of the idea can be quite complex. For example, one straightforward method for creating such mosaics is simply for a user to manually select a portion from one image, then to paste it into another image. This process can be repeated as many times as desired to create a mosaic image containing the desired elements. However, this “simple” case actually requires the computational capabilities of the human mind for identifying occluding objects and selecting non-occluded portions of other images for filling in the occluded areas of a target image.
One conventional scheme simply averages a number of aligned image frames of a scene to produce a relatively non-occluded scene. However, simple image averaging tends to introduce artifacts such as “ghosting,” wherein objects visible in only a relatively small number of image frames are faintly visible in the composite image.
Another conventional scheme takes the median of a number of aligned image frames to produce a relatively non-occluded scene. Or alternatively, selects the most common value at each location when considering the aligned image frames collectively. In this way a portion of the scene that is not occluded in a majority of the aligned image frames will be selected, and any region that is occluded in only a minority of the aligned image frames will be replaced by a portion from one of the non-occluded images. Several variations on these schemes where individual aligned image frames essentially “vote” to determine which frames contain occluded and which contain non-occluded data are possible. Unfortunately, such voting does not work unless the non-occluded aligned images are in a majority at every location of the scene. For example, in the case where only two aligned image frames are available, and their difference indicates non-negligible occlusion in at least one of them, neither the median nor the most common value approach can identify which frame is occluded.
Other methods involve “background subtraction” type techniques for subtracting one image from another, following image alignment or registration, for identifying areas of difference between the image frames. Given an otherwise static scene, it is probable that that any occluding objects will be located within the identified areas of difference. However, determining which of the identified areas of difference between the images actually includes an occluding object, and which does not is a significantly more complex problem.
For example, a number of automatic schemes have been proposed or implemented for identifying occluding objects in image frames. These schemes include methods for automatically modeling a sequence of images, such as a video sequence, using a layered representation for segmenting images into individual components. These individual components can then be used to create mosaic images, or even mosaic video sequences. For example, having identified the individual components of an image sequence, they can then be used in combination with other image frames from the sequence to remove those components or objects from the image sequence, thereby removing an “occlusion” from the scene. Alternately, such objects can simply be inserted into image frames of another image sequence, thereby overlaying, or occluding, the scene represented by that image sequence.
In general, the basic idea of such schemes is to isolate or identify a particular object or objects within a sequence of images using some sort of motion model for detecting movement of objects between image frames, then to decompose that image sequence into a number of layers, with each layer representing either an object or a background image over the entire image sequence. Such layered objects are commonly referred to as “sprites.” These sprites can then be inserted or extracted from particular image frames to create a desired mosaic effect.
However, learning “sprites” from an image sequence is a difficult task because there are typically an unknown number of objects in the image sequence, those objects typically have unknown shapes and sizes, and they must be distinguished from the background, other sprites, sensor noise, lighting noise, and significant amounts of deformation. Further, unless the frames of the image sequence are closely temporally related, or there are a sufficiently large number of image frames, it becomes difficult or impossible to identify objects through the use of motion models. Consequently, such schemes are not typically useful in cases involving limited numbers of image frames, or where those image frames may have been captured at different times, or sufficiently far apart in time such that the use of temporal motion models is ineffective for identifying objects or “sprites” in the images.
In addition, other conventional schemes for identifying objects within an image sequence make use of specialized models for identifying particular types of objects, such as, for example, a car, a truck, a human head, a ball, an airplane, etc. Models designed for identifying one particular type of object within an image sequence are typically ineffective for identifying other types of objects. Further, such models typically operate best as the number of image frames increase, and as the objects within the image frames exhibit some observable motion from frame to frame. Therefore, in the case of limited image sequences, such as, for example where there are only two images, such schemes are often unable to determine whether a portion of one image is actually an occlusion of the scene, or simply a part of the scene.
Still other conventional image modeling schemes for identifying elements within an image sequence include techniques for probabilistic pattern analysis and pattern classification for identifying elements within an image sequence. Such schemes tend to be computationally expensive, and again, they tend to operate poorly in the case of limited input images, such as the case where it is necessary to decide which of two images includes an occlusion, and which of the two images does not.
Consequently, what is needed is a system and method for identifying occlusions in limited sets of images. Further such a system and method should be capable of operating independently of any temporal relationships or motions of objects between the images. In addition, such a system and method should be capable of determining whether any portion of a single image of a scene, identified as being different from another image of the same scene, is occluded by simply analyzing that identified portion of the single image by itself. Finally, such a system and method should be capable of automatically removing identified occlusions by creating a mosaic image using non-occluded portions from two or more images to create the mosaic image.