There is a large class of applications that depend upon the ability to localize a model of an object in an image, a task known as xe2x80x9cregistration.xe2x80x9d These applications can be roughly categorized into detection, alignment, and tracking problems.
Detection problems involve, for example, finding objects in image databases or finding faces in surveillance video. The model in a detection problem is usually generic, describing a class of objects. For example, in a prior art face detection system, the object model is a neural network template that describes all frontal, upright faces. See Rowley et al., xe2x80x9cNeural network-based face detectionxe2x80x9d, IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(1), pages 23-38, January 1998. Another example is locating armored vehicles in images for a military targeting system.
An example of an alignment application is mosaicing, in which a single large image is constructed from a series of smaller overlapping images. In this application, each model is simply an image to be added incrementally to the mosaic. The alignment goal is to position each new image so that it is consistent with the current mosaic wherever the two overlap. A description is given in Irani et al., xe2x80x9cMosaic based representations of video sequences and their applications,xe2x80x9d Proceedings of Int. Conference on Computer Vision, pages 605-611, Cambridge, Mass., 1995.
Another example is the alignment of plural images obtained from different sensors, e.g. aligning remote-sensed images obtained via normal and infra-red photography, or aligning MRI and SPECT medical images. This allows different regions of an image to be analyzed via multimodal (i.e., vector) measurements instead of scalar pixel intensities. These and other applications are further discussed in the survey on image registration, Brown, xe2x80x9cA survey of image registration techniques,xe2x80x9d ACM Computing Surveys, 24(4), pages 325-376, 1992.
In tracking applications, the models are typically specific descriptions of an image object that is moving through a video sequence. Examples include tracking people for surveillance or user-interface purposes. In figure tracking for surveillance, a stick-figure model of a person evolves over time, matched to the location of a person in a video sequence. A representative prior method is Cham et al., xe2x80x9cA multiple hypothesis approach to figure tracking,xe2x80x9d Proceedings Computer Vision and Pattern Recognition, pages 239-245, Fort Collins, Colo., 1999. In user-interface applications, the user""s gaze direction or head pose may be tracked to determine their focus-of-attention. A prior method is described in Oliver et al., xe2x80x9cLAFTER: Lips and face real time tracker,xe2x80x9d Proceedings Computer Vision and Pattern Recognition, pages 123-129, San Juan, PR, Jun. 17-19, 1997.
In each of these application areas, there is a desire to handle increasingly sophisticated object models, which is fueled by the increasing demand for sensing technologies. For example, modern user interfaces may be based on tracking the full-body pose of a user to facilitate gesture recognition. As the complexity of the model increases, the computational cost of registration rises dramatically. A naive registration method such as exhaustive search would result in a slow, inefficient system for a complex object like the human figure. However a fast and reliable solution would support advanced applications in content-based image and video editing and retrieval, surveillance, advanced user-interfaces, and military targeting systems.
Therefore, there is a need for a registration method which is computationally efficient in the presence of complex object models.
The present invention registers a model in an image sequentially, that is, one feature at a time. Until now, sequential feature registration has been done in an order predetermined before any registration begins. The Applicants have found that the process of registration can be optimized by determining a feature registration order dynamically, that is, at the beginning of each registration step, selecting the feature whose registration will be most cost effective and searching for that feature.
Accordingly, in a preferred method of registering an object model in an image, where the object model has a plurality of features and is described by a model state, an unregistered feature of the object model is selected such that the cost function of a subsequent search is minimized. A search is performed for a match of the selected model feature to the image to register the feature, and the model state is updated accordingly. These steps are repeated until all features have been registered.
Preferably, the search is performed in a region of high probability of a match. The cost function for a feature is based on the feature""s basin of attraction, and in particular can be based on the complexity of the search process at each basin of attraction.
A search region can be based on a projected state probability distribution. A search is preferably based on maximizing a comparison function.
Selecting and searching are preferably responsive to a propagated state probability distribution. The state probability distribution is projected into feature space.
For each unregistered feature, the number of search operations required to find a match with at least a predetermined probability is determined, and the feature requiring a least number of search operations is selected.
The number of required search operations is determined by first finding search regions within a feature space, where each region has an associated probability density which exceeds some predetermined threshold. A total probability is then formed by summing the probabilities associated with each of the found search regions. If the total probability is less than the predetermined probability, the threshold is lowered and new regions are found and the probabilities summed. This process repeats until the total probability is greater or equal to the predetermined probability. Finally, the number of required search operations is computed, based on the found search regions.
Searching can comprise feature-to-feature matching. In this case, the number of search operations is preferably the number of target features located within each search region. The number of target features located within the search region can be based on Mahalanobis distances to potential target features.
In a preferred embodiment, target features are approximately uniformly distributed, and the number of features is proportional to the search region""s size. Thus, the features can be ranked according to the sizes of the associated search regions.
Alternatively, searching can comprise feature-to-image matching. Here, the number of required search operations is computed, for each search region, by first dividing the region into minimally-overlapping volumes which have the same size and shape as a basin of attraction associated with the feature, and then by counting the number of volumes required to cover the regions.
In a preferred embodiment, counting the volumes is approximated by obtaining eigenvalues and eigenvectors to a covariance matrix associated with the feature search region. A basin of attraction span is calculated for each eigenvector direction, and a count is approximated from the eigenvalues and the spans.
The model state is updated according to a propagated state probability distribution. Preferably, the propagation of the probability distribution is based on successive registered features.
The present invention is thus a method and apparatus for sequential feature registration in which features are selected dynamically so as to minimize total matching ambiguity, based on a propagated state probability model. The method can be applied both to feature-to-feature matching and to feature-to-image matching.
The computation of matching ambiguity is based on a state probability model projection from model space to feature space, and on the feature or image data.
In the preferred embodiment of feature-to-feature matching, matching ambiguity is given by the number of target features located inside the search region of a source feature.
In the preferred embodiment of feature-to-image matching, matching ambiguity is given by the minimum number of attractor regions which are required to span the search region of a source feature.
In the preferred embodiment, the state probability model is a Gaussian distribution and is propagated using the Kalman filter update step. Matching ambiguity is defined in terms of a feature search region, i.e., a high probability region in feature space.
In another preferred embodiment, not all of the available features are matched. Matching continues until a desired level of certainty in the estimate has been achieved.
In yet another preferred embodiment, optimal feature orders are computed over a corpus of examples of registration tasks. A single fixed ordering is obtained from the training set of optimal orderings.