1. Field of the Invention
The invention relates in general to biometric identification, and more particularly, to a surveillance system using biometric identification.
2. Brief Description of the Related Art
The state of the art of applying biometric technologies to authenticate and positively determine the identity of a person is still faced with several technical challenges. Specifically, the challenges can be categories into two aspects: data acquisition and data matching. Data acquisition deals with acquiring biometric data from individuals. Data matching deals with matching biometric data both quickly and accurately. These challenges can be explained by a port-entry scenario. In such a setting, it is difficult to obtain certain biometric data such as DNA and voice samples of individuals. For biometric data that can be more easily acquired, such as face images and fingerprints, the acquired data quality can vary greatly depending on acquisition devices, environmental factors (e.g., lighting condition), and individual corporation. Tradeoffs exist between intrusiveness of data collection, data collection speed, and data quality.
Once after the needed data have been acquired, conducting matching in a very large database can be very time-consuming. It goes without saying that unless a system can acquire and match data both timely and accurately, the system is practically useless in improving public security, where the inconvenience due to the intrusive data-acquisition process and the time-consuming matching process ought to be minimized.
A biometric system typically aims to address either one of the following issues: 1) Authentication: is the person the one he/she claims to be? 2) Recognition: who a person is? In the first case, data acquisition is voluntary and matching is done in a one-to-one fashion—matching the acquired data with the data stored on an ID card or in a database. In the second case, individuals may not be cooperating, and the system must conduct searches in a very large repository.
The prior art in biometric can be discussed in two parts: single-modal solutions and multi-modal solutions. Several systems have been built to use one of the following single modal: facial data, voice, fingerprint, iris or DNA. The effectiveness of these single-modal approaches can be evaluated in three metrics: the degree of intrusiveness, speed and accuracy. From the perspective of a user, acquiring face modal can be the most noninvasive method, when video cameras are mounted in the distance. However, the same convenience nature often compromises data quality. An intrusive face acquisition method is to acquire frontal face features, which requires corporation from individuals. Voice is another popular modal. However, traditional voice-recognition fails miserable when voice samples of multiple individuals are simultaneously captured or when background noise exists. Even when the acquired voice data can be “pure,” existing signal processing and matching techniques can hardly achieve recognition accuracy of more than 50%. The next popular modal is fingerprint, which can achieve much higher recognition accuracy at the expense of intrusive data acquisition and time-consuming data matching. Finally, DNA is by far the most accurate recognition technique, and the accompanying inconvenience in data acquisition and the computational complexity are both exceedingly high. Summarizing the single model approach, non-intrusive data-acquisition techniques tend to suffer from low recognition accuracy, and intrusive data-acquisition techniques tend to suffer from long computational time
As to multimodal techniques, there have been several prior art United States patents and patent applications disclose techniques. However, as will be further discussed below, these disclosures do not provide scalable means to deal with tradeoffs between non-intrusiveness, speed and accuracy requirements. These disclosures may fix their system configuration for a particular application, and cannot adapt to queries of different requirements and of different applications.
Wood et al. disclose in U.S. Pat. No. 6,609,198 a security architecture using the information provided in a single sign-on in multiple information resources. Instead of using a single authentication scheme for all information resources, the security architecture associates trust-level requirements with information resources. Authentication schemes (e.g., those based on passwords, certificates, biometric techniques, smart cards, etc.) are employed depending on the trust-level requirement(s) of an information resource (or information resources) to be accessed. Once credentials have been obtained for an entity and the entity has been authenticated to a given trust level, access is granted, without the need for further credentials and authentication, to information resources for which the authenticated trust level is sufficient. The security architecture also allows upgrade of credentials for a given session. The credential levels and upgrade scheme may be useful for a log-on session; however, such architecture and method of operations do not provide a resolution for high speed and high accuracy applications such as passenger security check in an airport.
Sullivan et al. disclose in U.S. Pat. No. 6,591,224 a method and apparatus for providing a standardized measure of accuracy of each biometric device in a biometric identity authentication system having multiple users. A statistical database includes continually updated values of false acceptance rate and false rejection rate for each combination of user, biometric device and biometric device comparison score. False acceptance rate data are accumulated each time a user successfully accesses the system, by comparing the user's currently obtained biometric data with stored templates of all other users of the same device. Each user is treated as an “impostor” with respect to the other users, and the probability of an impostor's obtaining each possible comparison score is computed with accumulated data each time a successful access is made to the system. The statistical database also contains a false rejection rate, accumulated during a test phase, for each combination of user, biometric device and biometric device comparison score. By utilizing a biometric score normalizer, Sullivan's method and apparatus may be useful for improving the accuracy of a biometric device through acquiring more training data.
Murakami et al. disclose in U.S. Pre-Grant Publication 2002-0,138,768 entitled “Method for biometric authentication through layering biometric traits,” a portable biometric authentication system having a single technology for measuring multiple, varied biological traits to provide individual authentication based on a combination of biological traits. At least one of these biometric traits is a live physiological trait, such as a heartbeat waveform, that is substantially—but not necessarily completely unique to the population of individuals. Preferably, at least one of the identifying aspects of the biological traits is derived from a measurement taken by reflecting light off the subdermal layers of skin tissue. The Murakami et al. approach is limited by the more intrusive measurement techniques to obtain data such as heartbeat waveform and reflecting light off the subdermal layers of skin tissue. These data are not immediately available in a typical security check situation to compare with the biometric data, e.g., heart beat waveforms and reflection light from subdermal layers from the skin of a targeted searching object. Furthermore, the determination or the filtering of persons' identity may be too time consuming and neither appropriate for nor adaptive to real time applications.
Langley discloses in U.S. Pre-Grant Publication 2002-0,126,881, entitled “Method and system for identity verification using multiple simultaneously scanned biometric images,” a method to improve accuracy and speed of biometric identity verification process by use of multiple simultaneous scans of biometric features of a user, such as multiple fingerprints, using multiple scanners of smaller size than would be needed to accommodate all of the fingerprints in a single scanner, and using multiple parallel processors, or a single higher speed processor, to process the fingerprint data more efficiently. Obtaining biometric data from multiple user features by use of multiple scanners increases verification accuracy, but without the higher cost and slower processing speed that would be incurred if a single large scanner were to be used for improved accuracy. The methods according to Langley may provide the advantages of speed and accuracy improvements. However, the nature of requiring multiple scans makes data acquisition time-consuming and intrusive.
On the academia side, much research effort has been geared toward analyzing data from individual biometric channels (e.g., voice, face, fingerprint, please see the reference list for a partial list), less emphasis has been placed on comparing the performance of different approaches or combing information from multiple biometric channels to improve identification. Some notable exceptions are discussed below. In Hong Lin, Jain A. K., Integrating faces and fingerprints for personal identification, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 20, No. 12, December 1998, pp. 1295-1307, the authors report an automated person identification system that combines face and fingerprint information. The face recognition method employed is the traditional eigen face approach, M. Turk and A. Pentland, Eigenfaces for Recognition, J. Cognitive Neuroscience Vol. 3, No. 1, 1991, pp. 71-96, which computes a set of orthonormal bases (eigen faces) of the database images using the principal component analysis. Face images are then approximated by their projection onto the orthonormal Eigen face bases, and compared using Euclidean distances. For fingerprint, the authors extend their previous work, Jain, A. K.; Lin Hong; Bolle, R.; On-line fingerprint verification, Pattern Analysis and Machine Intelligence, Vol. 19, No. 4, April 1997, pp. 302-314, to extract minutiaes from fingerprint images. They then align two fingerprint images by computing the transformation (translation and rotation) between them. Minutiaes are strung together into a string representation and a dynamic programming-based algorithm is used to compute the minimum edit distance between the two input fingerprint strings. Decision fusion is achieved by cross validation of the top matches identified by the two modules, with matching results weighed by their confidence or accuracy levels. The performance of the system is validated on a database of about 640 face and 640 fingerprint images.
In Phillips, Henson Moon; Rive, S E A.; Russ, The FERRET evaluation methodology for face-recognition algorithms, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 22, No. 10, October 2000, pp. 1090-1104, the Michigan State University research group extends their information fusion framework to include more modalities. In particular, images of a subject's right hand were captured, and fourteen features comprising the lengths of the fingers, widths of the fingers, and widths of the palm at various locations of the hand. Euclidean distance metric was used to compare feature vectors. Simple sum rules, decision tree and linear discriminant function are used for classification. It is observed that a personal ID system using three modules outperforms that uses only two of the three modules. While this is an interesting experiment, the data set used is small and there is no accepted universal standard in using hand images in biometrics.
In R. Brunelli, D. Falavigna, T. Poggio and L. Stringa, Automatic Person Recognition by Using Acoustic and Geometric Features, Machine Vision and Applications 1995, Vol. 8 pp. 317-325, an automated person recognition system using voice and face signatures is presented. The speaker recognition subsystem utilizes acoustic parameters (log-energy outputs and their first-order time derivatives from 24 triangular band-pass filters) computed from the spectrum of short-time windows of the speech signal. The face recognition subsystem is based on geometric data represented by a vector describing discriminant facial features such as positions and widths of the nose and mouth, chin shape, thickness and shape of the eyebrows, etc. The system captures static images of the test subjects and the test subjects are also asked to utter ten digits from zero to nine for use in the speaker ID subsystem. Each subsystem then computes the distances of the test subject's speech and face signatures with those stored in the databases. Decisions from the two ID modules are combined by computing a joint matching score that is the sum of the two individual matching scores, weighted by the corresponding variance. Experimental results show that integration of visual and acoustic information enhances both performance and reliability of the separate systems. The above system was later improved upon in Brunelli, R.; Falavigna, D., Person identification using multiple cues, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 17, No. 10, October 1995, pp. 955-966, where multiple classifiers are used in the face recognition subsystems, and the matching score normalization process is made more robust using robust statistical methods.
In Kittler, J.; Hatef, M.; Duin, R. P. W.; Matas, J., On combining classifiers, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 20, No. 3, March 1998, pp. 226-239, a performance study of various ensemble classification scheme is presented. It is shown that many existing decision aggregation rules are actually simplifications based on the more general Bayesian rule. The authors compare the performance of different decision aggregation rules (max, min, median, and majority voting rule) by performing an experiment in biometrics. Three modules are used: frontal faces, face profiles, and voiceprints. Simple correlation-based and distance-based matching is performed on frontal faces and face profiles, respectively, by finding a geometric transformation that minimizes the differences in intensity. It is shown that a simple aggregation scheme by summing the results from individual classifiers actually perform the best.
In Lu X; Wang Y; and Jain A, Combing classifiers for face recognition, IEEE International Conference on Multimedia Systems and Expo, Baltimore, Md., July 2003, three well-known appearance-based face recognition methods, namely PCA, M. Turk and A. Pentland, Eigenfaces for Recognition, J. Cognitive Neuroscience Vol. 3, No. 1, 1991, pp. 71-96, ICA, and LDA, Belhumeur, P. N.; Hespanha, J. P.; Kriegman, D. J., Eigenfaces vs. Fisherfaces: recognition using class specific linear projection, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 19, No. 7, July 1997, pp. 711-720, are used for face image classification. Two combination strategies, the sum rule and RBF network, are used to integrate the outputs from these methods. Experimental results show that while individual methods achieve recognition rates between 80% and 88%, the ensemble classifier boosts the performance to 90%, using either the sum rule or RBF network. In Senior, A., A combination fingerprint classifier, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 23, No. 10, October 2001, pp. 1165-1174, a similar multi-classifier scheme, this time for fingerprint classification, is proposed. Hidden Markov Models and decision trees are used to recognize ridge structures of the fingerprint. The accuracy of the combination classifier is shown to be higher than that of two state-of-the-art systems tested under the same condition. These studies represent encouraging results that validate our multi-modal approach, though only a single biometric channel, either face or fingerprint, not a combination of biometric channels, is used in these studies.
Maio, D.; Maltoni, D.; Cappelli, R.; Wayman, J. L.; Jain, A. K., FVC2000: fingerprint verification competition, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 24, No. 3, March 2002, pp. 402-412, documents a fingerprint verification competition that was carried out in conjunction with the International Conference on Pattern Recognition (ICPR) in 2000 (a similar contest was held again in 2002). The aim is to take the first step towards the establishment of a common basis to better understand the state-of-the-art and what can be expected from the fingerprint technology in the future. Over ten participants, including entries from both academia and industry, took part. Four different databases, two created with optical sensors, one with a capacitive sensor, and one synthesized, were used in the validation. Both the enrollment error (if a training image can be ingested into the database or not) and the matching error (if a test image can be assigned the correct label or not) and the average time of enrollment and matching are documented.
A study, that is similar in spirit but compares the performance of face recognition algorithms, is reported in Phillips, P. J.; Hyeonjoon Moon; Rizvi, S. A.; Rauss, P. J., The FERET evaluation methodology for face-recognition algorithms, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 22, No. 10, October 2000, pp. 1090-1104. A subset of the Feret database (a gallery of over 3000 images) was used in the study. Ten different algorithms, using a wide variety of techniques, such as PCA and Fischer discriminant, were tested. Cumulative matching scores as a function of matching ranks in the database are tabulated and used to compare the performance of different algorithms. This study was repeated three times, in August 1994, March 1995, and July 1996. What is significant about this study is that the performance of the face recognition algorithms improved over the three tests, while the test condition became more challenging (with increasingly more images in the test datasets).
As can be seen from the above brief survey, multi-modal biometrics holds a lot of promise. It is likely that much more accurate classification results can be obtained by intelligently fusing the results from multiple biometric channels given performance requirements. While it is important to keep on improving the accuracy and applicability of individual biometric sensors and recognizers, the performance of a biometric system can be boosted significantly by judiciously and intelligently employing and combining multiple biometric channels.
While there have seen significant research activities in single- and multi-channel biometry over the past decade, the state-of-the-art is still wanting in terms of speed and accuracy. Therefore, a need still exists in the art to provide new and improved methods and system configurations to increase the speed and accuracy of biometric identity verification and determinations such that the above-mentioned difficulties and limitations may be resolved. The present invention meets this need.