(1) Field of Invention
The present invention relates to a system for three-dimensional object recognition and foreground extraction and, more particularly, to a system for three-dimensional object recognition using an appearance-based model and foreground extraction through segmentation, nearest neighbor and clustering approaches.
(2) Description of Related Art
Prior art exists which describes recognition of three-dimensional objects in two-dimensional images. However, current prior art has been unable to recognize and segment objects in a multi-object scene setting in a reliable manner. One of the primary reasons that algorithms have failed is because of the requirement of accurate object-pose classification that a segmentation code demands. Segmentation refers to the process of partitioning an image into multiple segments. Image segmentation is typically used to locate objects and boundaries in images. Segmentation algorithms in the prior art compute a transformation from features matched from a specific training pose to features matched in the test pose. However, due to lack of population of features matched to any specific training pose or ambiguous matches in the case of feature-less objects, the algorithm fails in many cases. Since the recognition depended on the segmentation, recognition also failed.
Additionally, prior art in the field involves using local feature algorithms on monochrome images, followed by classification of each feature (also referred to as “keypoint”) detected to a previously trained object's pose, and computing a transform from the matched pose features to the current features to transform a trained object's pose boundary to the current detected object. The prior art contains several disadvantages. First, local feature algorithms are described for monochrome images. However, how to proceed with these algorithms in color images is not clear. Literature exists for combining all three channels by appending the three feature descriptors to make one descriptor. However, this makes the dimensionality of the feature descriptor, which is already quite large, even larger, thus making it sensitive to distortions. Second, a scene with multiple objects, known and unknown, presents many challenges. The presence of incorrectly classified keypoints on the boundary of the objects makes the recognition and, consequently, the segmentation algorithm fail. Third, the segmentation algorithms in prior art require reliable feature matches to one training object's pose in order to segment the objects in the current object. However, when there are an inadequate number of features detected or features from training and test are incorrectly matched (especially in the case of the presence of ambiguous matches in the case of texture-less objects), the segmentation fails.
Thus, a continuing need exists for a system for three-dimensional object recognition in color images which does not increase the dimensionality of the feature descriptor, thereby keeping the computation efficient in spite of the increase in information about the image.
The present invention is also directed to the field of foreground extraction. Prior art exists which describes methods of removing background and extracting foreground objects from images. Two well-known methods include stationary background/camera and feature detection algorithms. In the case of a stationary camera, one can use a background mask. Background masking works well only in the case where the camera is expected to be in the same location and orientation (i.e., viewpoint angle) throughout the period that the computer vision algorithm is used. This is inconvenient because many times a user expects to change the location of the camera for various reasons or have a robot move the camera around. Additionally, the orientation of the camera might be changed by accident. All of these actions will then render the background mask ineffective. Background subtraction is effective if the computer vision algorithm is expecting the foreground objects to move and the background to stay still as described by Chien et al in “Efficient Moving Object Segmentation Algorithm Using Background Segmentation Technique,” in IEEE Transactions on Circuits for Video Technology, Vol. 12, No. 7, July 2002. Prior techniques will also fail if the lighting changes between frames.
Furthermore, one could use features detected using a feature detection algorithm like SIFT (Scale Invariant Feature Transforms) as described by Lowe in “Distinctive Image Features from Scale-Invariant Keypoints” in International Journal of Computer Vision (60), 2004, 91-110 (hereinafter referred to as the Lowe reference) and SURF (Speeded-Up Robust Features) as described by Bay et al. in “SURF: Speeded-Up Robust Features” in European Conference on Computer Vision (ECCC), 2006 (hereinafter referred to as the Bay reference) to eliminate features that might belong to the background. However, this leaves behind fairly pixelated regions of the background that are not well segmented. The Lowe reference and the Bay reference are hereby incorporated by reference as though fully set forth herein.
Most computer vision applications that need background removal have the camera fixed in the same position, such as a surveillance camera, for example. Having the camera in a fixed location and orientation speeds up the computation time as the algorithms can be trained on how the foreground objects appear in that particular viewpoint. Moreover, these methods will not work if the lighting changes from a previous frame.
Thus, a continuing need exists for a method for extraction of foreground objects and the correct rejection of background from an image of a scene despite changes in lighting or camera viewpoint.