There is a large class of applications that depend upon the ability to localize a model of an object in an image, a task known as xe2x80x9cregistration.xe2x80x9d These applications can be roughly categorized into detection, alignment, and tracking problems.
Detection problems involve, for example, finding objects in image databases or finding faces in surveillance video. The model in a detection problem is usually generic, describing a class of objects. For example, in a prior art face detection system, the object model is a neural network template that describes all frontal, upright faces. See Rowley et al., xe2x80x9cNeural network-based face detectionxe2x80x9d, IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(1), pages 23-38, January 1998. Another example is locating armored vehicles in images for a military targeting system.
An example of an alignment application is mosaicing, in which a single large image is constructed from a series of smaller overlapping images. In this application, each model is simply an image to be added incrementally to the mosaic. The alignment goal is to position each new image so that it is consistent with the current mosaic wherever the two overlap. A description is given in Irani et al., xe2x80x9cMosaic based representations of video sequences and their applications,xe2x80x9d Proceedings of Int. Conference on Computer Vision, pages 605-611, Cambridge, Mass., 1995.
Another example is the alignment of plural images obtained from different sensors, e.g. aligning remote-sensed images obtained via normal and infra-red photography, or aligning MRI and SPECT medical images. This allows different regions of an image to be analyzed via multimodal (i.e., vector) measurements instead of scalar pixel intensities. These and other applications are further discussed in the survey on image registration, Brown, xe2x80x9cA survey of image registration techniques,xe2x80x9d ACM Computing Surveys, 24(4), pages 325-376, 1992.
In tracking applications, the models are typically specific descriptions of an image object that is moving through a video sequence. Examples include tracking people for surveillance or user-interface purposes. In figure tracking for surveillance, a stick-figure model of a person evolves over time, matched to the location of a person in a video sequence. A representative prior method is Cham et al., xe2x80x9cA multiple hypothesis approach to figure tracking,xe2x80x9d Proceedings Computer Vision and Pattern Recognition, pages 239-245, Fort Collins, Colo., 1999. In user-interface applications, the user""s gaze direction or head pose may be tracked to determine their focus-of-attention. A prior method is described in Oliver et al., xe2x80x9cLAFTER: Lips and face real time tracker,xe2x80x9d Proceedings Computer Vision and Pattern Recognition, pages 123-129, San Juan, PR, Jun. 17-19, 1997.
In each of these application areas, there is a desire to handle increasingly sophisticated object models, which is fueled by the increasing demand for sensing technologies. For example, modern user interfaces may be based on tracking the full-body pose of a user to facilitate gesture recognition. As the complexity of the model increases, the computational cost of registration rises dramatically. A naive registration method such as exhaustive search would result in a slow, inefficient system for a complex object like the human figure. However a fast and reliable solution would support advanced applications in content-based image and video editing and retrieval, surveillance, advanced user-interfaces, and military targeting systems.
Therefore, there is a need for a registration method which is computationally efficient in the presence of complex object models.
There are many situations when different models are used for registering the same object. For example, an image indexing engine may register objects in images with different models using different feature sets such as color or outline shape depending on the query. Another example would be a visual surveillance network which tracks a person in multiple cameras, using separate full-body kinematic models in each camera view.
A simple approach is to register each of the models independently. However, this does not take advantage of the redundancy between the models and additionally suffers from:
1. Reduced Accuracy
By failing to exchange information between the processes, registration errors can accumulate independently on the different processes, leading to reduced accuracies for all model states;
2. Discrepancies Between Model States
Independent registration also results in the violation of inherent constraints between model states. For example, tracking a single object using independent sensors may result in the sensors reporting very different positions for the object after some elapsed time. This may result from one of the sensors being distracted at an earlier instant by outlier noise such as background clutter. If the constraint of common position were enforced, tracking would be more robust as errors are resolved earlier; and
3. Poor Efficiency
The overall registration is only as fast as the slowest process. Hence if one of the processes is inefficient due to its features having high matching ambiguities, the overall registration speed is reduced. As an example, consider tracking a human using separate processes in separate cameras in a stereo setup. If the person moves partially out of the field-of-view of one of the cameras, the associated tracking process may be significantly slower if for example only the smaller features such as the arms or legs are visible. The overall tracking speed is reduced even though the second tracking process which has the full view of the person may be running at full speed.
Some methods take advantage of the redundancy by enforcing constraints between the states of the different models. This step typically involves correcting the model states after initial estimates have been made by registering all the features. For example, in the case of tracking, this corrective step is taken each time frame after the registration processes have been completed. While this may resolve problems with reduced accuracies and state discrepancies (i.e. items 1 and 2), it does not improve the poor efficiency as the rate limiting step is still dependent on the slowest registration process.
The present invention registers a plurality of object models in at least one image, one feature per model at a time. Until now, sequential feature registration has been done in an order predetermined before any registration begins. The Applicants have found that the process of registration can be optimized by determining a feature registration order dynamically, that is, at the beginning of each registration step, selecting the feature whose registration will be most cost effective and searching for that feature. In addition, only a subset of the models are selected for registration during at each cycle in the registration process. After registering features of the selected object models, the other object models"" states are updated according to inter- and intra-constraints.
Accordingly, in a preferred method of registering a plurality of object models in at least one image, where each object model has a plurality of features and is described by a model state, an unregistered feature of each object model is selected such that an associated cost function of a subsequent search is minimized. A subset of the object models is selected responsive to the selected features. For each selected object model, a search is performed for a match of the associated selected model feature to the image or images to register the feature, and the model state is updated accordingly. The model states of some or all of the object models are then updated according to a set of constraints. These steps are repeated until all features have been registered.
In one embodiment, the selected unregistered features are ranked according to some criterion, such as the number of operations needed to search for a feature, i.e., the matching ambiguity. Object models are then selected according to the ranking. Preferably, a predetermined number of object models is selected each cycle.
In a preferred embodiment, just one object model, the object model having the smallest matching ambiguity among all object models, is selected.
Constraints can be intra-model constraints and/or inter-model constraints, and restrict the model states to a shared relationship.
Preferably, each search is performed in a region of high probability of a match. The cost function for a feature is based on the feature""s basin of attraction, and in particular can be based on the complexity of the search process at each basin of attraction.
A search region can be based on a projected state probability distribution. A search is preferably based on maximizing a comparison function.
Selecting and searching are preferably responsive to a propagated state probability distribution. The state probability distribution is projected into feature space.
For each unregistered feature, the number of search operations required to find a match with at least a predetermined probability is determined, and the feature requiring a least number of search operations is selected.
The number of required search operations is determined by first finding search regions within a feature space, where each region has an associated probability density which exceeds some predetermined threshold. A total probability is then formed by summing the probabilities associated with each of the found search regions. If the total probability is less than the predetermined probability, the threshold is lowered and new regions are found and the probabilities summed. This process repeats until the total probability is greater or equal to the predetermined probability. Finally, the number of required search operations is computed, based on the found search regions.
Searching can comprise feature-to-feature matching. In this case, the number of search operations is preferably the number of target features located within each search region. The number of target features located within the search region can be based on Mahalanobis distances to potential target features.
In a preferred embodiment, target features are approximately uniformly distributed, and the number of features is proportional to the search region""s size. Thus, the features can be ranked according to the sizes of the associated search regions.
Alternatively, searching can comprise feature-to-image matching. Here, the number of required search operations is computed, for each search region, by first dividing the region into minimally-overlapping volumes which have the same size and shape as a basin of attraction associated with the feature, and then by counting the number of volumes required to cover the regions.
In some embodiments, feature-to-feature matching can be employed for some object models while feature-to-image matching is employed for other object models.
In a preferred embodiment, counting the volumes is approximated by obtaining eigenvalues and eigenvectors to a covariance matrix associated with the feature search region. A basin of attraction span is calculated for each eigenvector direction, and a count is approximated from the eigenvalues and the spans.
Model states of the selected object models are updated according to a propagated state probability distribution. Preferably, the propagation of the probability distribution for an object model is based on successive registered features.
The proposed invention is thus a framework for integrating a plurality of sequential feature registration processes. The processes have separate model states and sets of features. The framework allows only a subset of the processes to carry out feature search during each feature matching cycle. The processes in this subset are referred to as xe2x80x9coperative processes.xe2x80x9d In a preferred embodiment, there is only one operative process during each cycle.
The framework chooses the operative processes by ranking the currently selected features in all processes. The operative processes are associated with the smallest matching ambiguities in the selected features.
The framework also enforces any given set of constraints which apply to any model state. This may include constraints which restrict a plurality of model states to some shared relationship. Constraint satisfaction is carried out after the operative processes have completed feature search and state update. All model states are then minimally modified such that the given set of constraints is satisfied.
In a preferred embodiment, linear constraints are satisfied via the method of Lagrange multipliers.