1. Field of the Invention
The invention relates to a method and apparatus for matching portions of input images, and more particularly, to a method and apparatus that generates hierarchical graphs of aggregates to find matching between portions of images. The invention further relates to a method and apparatus for multiscale segmentation that combines motion and intensity cues.
2. Prior Art
Finding the correspondence between portions of input images is important for many vision tasks, such as, motion estimation, shape recovery, and object recognition. Matching two images is particularly difficult when the baseline between the camera centers of the images is wide.
Segmentation of objects based on their motion is perceptually striking, as is exemplified by motion sequences containing random dots. Finding satisfactory algorithmic solutions to this problem, however, has remained a challenge. Algorithmic approaches to motion segmentation seem to face both the difficulties that complicate the task of intensity-based segmentation along with the challenges that make motion estimation hard. Issues that complicate segmentation include devising an appropriate measure of similarity and rules of clustering to correctly separate the various segments. Similarly, difficulties in motion estimation are due to the sparseness of motion cues, particularly their absence in uniform regions and due to the aperture problem. Furthermore, of crucial importance is the selection of an appropriate motion model.
A number of effective algorithms have been proposed to address the problem of motion segmentation, many of which produce convincing results on quite complex motion sequences. These algorithms differ in the kind of information they use (sparse features versus dense intensity information) and the motion model they impose (2D parametric versus motion in 3D). Some of these approaches also recognize the importance of combining optical flow measurements with intensity information to solve the problem of motion segmentation.
Motion segmentation approaches that use dense intensity information largely impose 2D parametric motion models (mostly translation or affine). These include layered representations [44,46] (see also [27], [26,45] attempt to relax some of the main requirements of layered approaches), variational methods [30,28], graph-cuts algorithms [31,37,40], and sequential dominant motion removal [38]. Handling 3D motion is usually achieved by extracting and tracking a sparse set of features. Among these are subspace methods, which are restricted to orthographic projection [29,33,36,45] ([47] attempt to apply these methods directly to intensities). Other feature-based methods deal also with perspective projection [41,42].