1. Field
Embodiments of the present invention generally relate to biometric signatures and identification. More specifically, embodiments of the present invention seek to provide means for revocable biometric signatures and identification using robust distance metrics.
2. Description of the Related Art
Biometrics generally are methods of identifying or verifying the identity of a person based on a physiological characteristic. Examples of the features measured are: face, fingerprints, hand geometry, palmprints, iris, retinal, vein, and voice comparison. To be most effective, features to be measured should be distinctive between people and have a sufficient level of invariance over the lifetime of the person and sensor variations. Biometric technologies are becoming the foundation of an extensive array of highly secure identification and personal verification solutions. Throughout this discussion we use the term probe to mean biometric data being tested, and gallery to mean the collection of biometric data to which the probe is being compared.
Biometric signatures, the derived features which are actually matched, typically range from tens of bytes to over a megabyte, and thus have an advantage in that their information content is typically much higher than a password. Modern biometrics are generally based on a “similarity matching” using a pseudo-distance metric being computed between the biometric signatures. The ability to compute distances is important since intra-subject variations may sometimes be larger than inter-subject variations. Thus, systems often provide “top-N” matching, i.e. given the probe find the closest N examples in the gallery, to improve the chance the subject is identified, or use calibrated pseudo-distances in verifying the subject, accepting the probe's claim as verified only if and only if the pseudo-distance between the probe and the claimed gallery entry is below a threshold.
A number of systems have been designed that presume a biometric that consistently maps a biometric property of an individual to a unique key, such that no two individuals have the same key. While various attempts have been made, often clouded in cryptographic obscurity, such “perfect biometrics” do not yet exist. Biometrics suffer from variations in sensors, measurements, and alignment, and most suffer from actual variation or drift of the biometric signature itself. Intra-subject variation is therefore non-trivial. Most biometric systems depend on “similarity matching,” with higher “similarity” generally providing higher confidence in the match. This then supports different levels of false alarm or rejection risk decisions at different levels of confidence and for different applications.
There are a number of privacy concerns with biometric systems. First, there are the concerns that have been raised about the storage of biometric information. A person's biometric data is significantly invariant over time, and thus cannot be changed. This invariance serves as a key attribute, but also a liability. If the database or other repository is compromised, or a person's biometric data otherwise falls into the wrong hands, the loss is permanent. With techniques that allow reproduction, such as literally “printing” fingerprints from images of a fingerprint [2], the potential loss is substantial. The compromised biometric cannot be “replaced.” The concept of biometric signatures that can be canceled or revoked, and then replaced with a new signature, will provide privacy while not compromising security.
There are other privacy concerns as well. There are concerns about such private data being required and stored in many locations by many different government or other agencies. This is especially an issue with fingerprints because of their association with law enforcement investigations. Another concern is that a unique biometric stored in different databases can be used to link these databases and hence support non-approved correlation of data. Finally, there is the concern about searchable biometric databases, wherein covertly obtained biometric data, such as a face image or latent fingerprint, could be used to find additional information.
Before discussing prior art in protecting biometrics, we address the issue of what constitute protection of data. For clarity of discussion we consider protecting a collection of numbers x1 . . . xn Initially, it might seem sufficient to subject the data to a transform that is not mathematically invertible, e.g. yi=xi^2. While the function is mathematically non-invertible, each point has only a 2-point ambiguity. Anyone that has ever done a cryptogram or puzzle knows that even moderate amounts x of ambiguity is easily overcome with a little bit of knowledge or constraints. For example, if we knew the xi·s were locations in an image and hence positive, there would be no ambiguity. If the xi·s are shifted before squaring, say by “random” but known translation, there would still be no ambiguity. While the transform is formally non-invertible on each datum, knowledge of constraints and/or correlations in sets of data can often be exploited to remove ambiguity and hence effectively invert the overall transform. Thus we can conclude that using a mathematically non-invertible transform is not a sufficient criterion to provide protection.
To see that one can have protection without requiring a mathematical non-invertible transform one only need consider encryption. As anyone skilled in the art will know, without knowledge of the keys, encryption algorithms protect the underlying data from recovery. With public key algorithms, such as the well known RSA algorithm (U.S. Pat. No. 4,405,829), it is practical to have the algorithm and data necessary to protect data be publicly known yet still be able to recover the well protected data at some future date. Thus we can conclude that a mathematically non-invertible transform is neither necessary nor sufficient to provide protection of data.
While public key encryption (hereafter PK), can protect data, it cannot directly solve the problem of biometric data protection. While the encryption can be public, the data would need to be decrypted before it could be matched. A key property of any encryption algorithm is that even a single bit change in the input will cause significant changes in the encrypted data. All biometric data has inherent intra-subject variations across samples. Hence, we cannot just match two encrypted biometric signatures. If the data must be decrypted before each use, it remains vulnerable to capture after decryption. Furthermore, since it will be decrypted for each use the keys must be widely distributed, and because of the computational cost of decrypting each time there will be a strong motivation for the operators to store the gallery in an unencrypted form. Finally the encryption approach provides no protection against insiders either abusing their use of the data or selling it.
What is desired is not athransform that is simply mathematically non-invertible, but rather a transform that is “cryptographically secure”, by which we mean the data is protected such that recovering it, either by analysis or brute force guessing, is computationally intractable. A key component of the present invention is how to provide for “cryptographically secure” transforms of biometric data that can be matched while in encoded form and without decryption.
An important piece of prior art is U.S. Pat. No. 6,836,554 B1. “System and method for distorting a biometric for transactions with enhanced security and privacy”, Bolle et. al. 2004. This patent follow directly from an earlier paper N. Ratha, J. Connell, R. Bolle, “Enhancing security and privacy in biometrics-based authentication systems”, IBM Systems Journal, vol. 40 no. 3, 614 (2001). The U.S. Pat. No. 6,836,554, which is incorporated herein by reference, has considerable discussion of other relevant prior art.
In U.S. Pat. No. 6,836,554, the patent claims are focused on repeatable non-invertible distortions applied in the signal domain or feature domain. In the description it is suggested that distortions can be applied in either the image (signal domain) or on feature points (feature domain), during both enrollment and verification. When applied to images rather than features the patent teaches only techniques that are trivially invertible, hence inconsistent with the claims. Also it generally ignores how such transforms can degrade the system's ability to detect the features needed for identification. In addition, such distortions can have a significant negative impact on the measure between the probe and gallery image, thereby degrading the matching accuracy. For feature space transforms, the patent presents only 3 high level examples of non-invertible distortions, with insufficient detail to provide an understanding how to apply them to provide protection. As previously discussed, applying a non-invertible per feature is neither necessary nor sufficient to provide protection. Constraints and/or correlations may result in the majority of the transformed data being recovered (inverted) even when each individual transform is mathematically non-invertible. The U.S. Pat. No. 6,836,554 does not discuss this critical issue nor does it provide an example that provides protection. Furthermore, the patent does not teach us the “comparison process” that is central in its claims, i.e. it does not address the critical issue of how to compute distance measures between transformed features. Those skilled in the art will recognize that many different pesudo-measures can be used for biometric signature comparison, but equally well know that such measures play a critical role in determining algorithm effectiveness. It is unclear what measure would apply after the U.S. Pat. No. 6,836,554 non-invertable distortions. The noninvertible distortions suggested must, by definition, induce ambiguity in matching. Hence, they would significantly degrade the direct application of existing pseudo-distance measures on the biometric data as they increase intra-subject variations. While U.S. Pat. No. 6,836,554 and Ratha. et. al [1] introduces some interesting concept, it does not describe an implementation, provides no accuracy/performance examples and overall fails to teach us how to achieve its claims.
Other related prior art can be found in J. Cambier, U. Cahn von Seelen, R. Glass, R. Moore, I. Scott, M. Braithwaite, and J. Daugman. “Application-Specific Biometric Templates.” IEEE Workshop on Automatic Identification Advanced Technologies, Tarrytown, N.Y., Mar. 14-15, 2002, p. 167-171. In that paper the authors suggest an application specific biometric that cannot be matched across applications, but can when authorized be transformed to support changes in the user key or to generate a new key for a different application. Their approach is presented for the case of bit based representation where the “distance” between two transformed biometric signatures is the bit error, or some simple block based bit error rate. Their approach makes many assumptions on the transforms that will be very difficult to implement, but does provide two examples that satisfy their constraints. Important among their constraints is that the pseudo-distance between a probe and gallery must be the same before and after each of them is transformed. Thus their transformations do not degrade matching quality.
The requirement for invertibility of the transforms set forth in Cambier et al. is a weakness that limits the protection provided by the approach. The transform parameters may be stored at the point where the transformation of the biometric signature is applied. By their design, with those parameters and the stored signature the original biometric signature can be recovered. However, this means that if both are compromised, the biometric is compromised. Since the transformation parameters are generally applied at the client side, they will likely be either transmitted or carried on a smart card. Thus the design has traded the need to protect one set of data, the original biometric, for the need to protect the transformation parameters and each of the transform databases.
The transforms needed in U.S. Pat. No. 6,836,554, and Cambier et. al. will likely either be in a central database, accessed before computing the transformed space, or on a smart card. If stored in a central database, either technique could be designed for both identification and verification. Given an unknown sample, such as a latent fingerprint, the systems could obtain all transforms from the centralized database, apply each in turn and if it is “verified” include them as an identification result. This approach, viewing identification as a sequence of verifications of each subject in the database, may not be as fast or quite as effective as a system optimized for identification, but still provides basic identification ability. When used directly, neither approach provides privacy against search. To provide search protection, both techniques mention the use of smart-card storage so that no centralized storage of the transform exists.
Another approach that is implied in various research papers and elsewhere is an encryption of the biometric data to produce a unique key. Such an approach might include a user passcode allowing it to be revocable. However, such an approach has two primary problems. First, if the encryption needs to be inverted to match on the original data, then the system will need the user passcode and convert the data back to original form for matching, hence providing for access to the original biometric data. If the approach does not invert the data, then it must be matching the encrypted form of the biometric. However, the process of encryption may transform input such that adjacent items, i.e. nearly identical biometrics, will be encoded to very different numbers. Given that any biometric has a range of expected variations for the same individual, either the encrypted biometric will often not match the individual, or the data must be degraded so that all variations for an individual map to the same data. However, this would significantly degrade the miss detection rate. Furthermore, the quantization implicitly necessary to ensure no variation in the users data approach would have to fix the FMR/FNMR rate, a decision which would limit use in different applications.