Multi-modal identification systems have been growing in popularity over the years, particularly for their relevance to applications in-unconstrained environments (e.g., robotics or video surveillance). Multi-modal refers to multiple sources of data from which identification can be made. The sources of data can be different features of an entity to be identified.
For example, a person can be identified by a number of features, including face, height, body shape, gait, voice, etc. However, the features are not equal in their overall contribution to identifying a person. For instance, face and voice features can be highly discriminative in the identification process, while other features, such as, gait or body shape are only mildly discriminative. Even though high recognition rates can be achieved when classifying more discriminative features, such features are typically observed relatively rarely. For example, in a surveillance video sequence the face image can only be used if the person is close enough and is facing the camera. Similarly, a person's voice can only be used when the person actually speaks. In contrast, less discriminative features tend to be plentiful.
In pattern recognition, multiple classifiers can be used in order to improve the recognition rate of a given classification system. Many comparisons have been made between alternative combination rules, such as sum and product rules. In particular, the product rule is optimal when the classifiers in the ensemble are correlated, while the sum (or mean) rule is preferred if they are not. Rank order statistics rules (e.g., min/max) are more robust to outliers than the sum rule, but typically do not offer as much improvement over the error variance.
What is needed, is a multi-modal identification system that utilizes a classifier combination framework.