Recently, the various advantages of biometric data—representations of various innate physical characteristics of a person, such as the structure of his/her face, fingerprints, retinal pattern, voice, vein patterns, etc.—have led to numerous efforts to develop biometric-based verification or security applications. Properties having particular value are the uniqueness (i.e., relatively low likelihood of two people sharing confusingly similar biometric characteristics) and usability (i.e., innate characteristics eliminate need to manage multiple tokens or passwords) of biometric data. However, a significant shortcoming of biometric data is summarized by the observation, noted in the literature, that biometrics are identifiers, not secrets.
Stated another way, unlike tokens or keywords, where security is provided by the fact that they are secrets known only to the user (absent inadvertent or illicit disclosure), biometric data is not secret. That is, biometric data can be readily recorded by attackers, e.g., an image of a person's face or recording of a person's voice can be surreptitiously captured, latent fingerprints can be scanned, etc. A significant consequence of lost biometric data is that once compromised, it becomes compromised forever. Unlike secret passwords or tokens, compromised biometric data cannot be canceled or revoked, e.g., you can't issue someone new fingerprints. Given the relative scarcity of available biometric data for a person, compromise of even a single characteristic used for authentication/security purposes could be catastrophic.
Recognizing this limitation, various techniques for the production of cancelable or revocable biometrics have been proposed. Generally, these techniques apply an irreversible or one-way transform to biometric data. The resulting transformed biometric data may then be used for authentication/security purposes instead of the actual biometric data. If the transformed biometric data ever becomes known to an attacker, the one-way transform prevents it from being used to reconstruct (or, ideally, even provide a reasonable estimate of) the original biometric data. However, unlike actual biometric data, if the transformed biometric data is compromised, it can be canceled and a new transformed biometric issued using a new one-way transform. In short, the transformed biometric data becomes more like a password or token. One technique that could be employed for this purpose is to use known encryption techniques to encrypt biometric data. However, encryption techniques suffer from the sometimes difficult problem of key management, and the potential for compromise if the key is lost. Furthermore, the computation overhead associated with many encryptions systems is substantial.
In another technique, cancelable biometrics are produced by applying distorting transformations to the biometric data or domain-transformed versions thereof. According to this technique, various values within the biometric data are permuted or otherwise scrambled in a known, repeatable fashion. In particular, specific features of the biometric data are selected, e.g., minutiae of a fingerprint in a spatial domain or frequency peaks of a voice sample in a frequency domain, for application of the distortion. Without knowing the specific distortion pattern applied to the biometric data, it becomes difficult to undo the distortion. In a most secure variation, the applied distortion actually causes irretrievable loss of some biometric data, e.g., through mapping of various biometric features onto each other thereby making it impossible to determine the original configuration of the biometric features. By applying the identical distortion to correlated biometric data (e.g., a fingerprint taken during an enrollment phase versus the same fingerprint recorded during an authentication attempt), it remains possible to compare the distorted versions of the biometric data to provide verification. In the event that distorted biometric data is compromised, new biometric data may be obtained and a new distortion applied. However, the privacy provided by the distorted biometric data (in the case of lossless distortion) is questionable given that large portions of the original biometric data are either not distorted at all or are only minimally distorted such the reconstruction may still be possible even without knowledge of the original distortion function. Furthermore, in the case of lossy distortion, where some of the features of the biometric data are permanently lost, utility of the biometric data, i.e., uniqueness of the biometric data relative to its provider, may be impaired.
Other approaches have been recently developed. For some time, researchers have appreciated the theoretical utility of a construct known as Slepian-Wolf coding or distributed source coding (DSC). The basic concepts of DSC are illustrated with reference to FIGS. 1 and 2. As shown in FIG. 1, two correlated signals, X and Y, are encoded 102, 104 and transmitted at bit rates RX and RY, respectively, to a decoder 106 that jointly decodes X′ and Y′. Note that the signals, X and Y, although correlated, come from sources that have no “knowledge” of each other, i.e., they are distributed. Slepian and Wolf established the theoretical limits for the minimum encoding rate required such that X and Y can still be recovered perfectly. This is illustrated in FIG. 2, where the various potential encoding bit rates, RX and RY, are illustrated along the respective horizontal and vertical axes. As known in the art, the entropy of a discrete random variable, H(X), is representative of the minimum amount of information needed to reconstruct a message, e.g., a string of bits, generated according to the random variable. In this sense, entropy is an expression of the limit to which a message can be compressed without loss of information. Rates RX and RY above H(X) and H(Y), respectively, (as illustrated by the upper right quadrant 202) are always sufficient to reconstruct X and Y. Slepian and Wolf demonstrated that the decoder 106 could further reconstruct X and Y perfectly if:RX+RY≧H(X,Y)  Eq. 1RY≧H(Y|X)  Eq. 2RX≧H(X|Y)  Eq. 3
These boundaries are illustrated in FIG. 2 by the heavy line and delineate a region above the boundary in which the respective code rates RX and RY are sufficient to provide perfect reconstruction of X and Y. In other words, the Slepian-Wolf boundary establishes exact reconstruction so long as the sum of the encoding rates of X and Y is greater than their joint entropy, H(X,Y), and that given the correlation of the sources, the minimum encoding rate for X is limited by the conditional entropy H(X|Y) and, conversely, the minimum encoding rate for Y is limited by the conditional entropy H(Y|X). Intuitively, this means that the correlation of the respective sources can be exploited to transmit X and Y at rates less than their independent entropy limits. Two examples of this are illustrated in FIG. 2. In the first example, X is encoded at a rate H1(X), where H(X|Y)<H1(X)<H(X), and Y is encoded at a rate H1(Y), where H(Y|X)<H1(Y)<H(Y). Although H1(X) and H1(Y) satisfy Eqs. 2 and 3 above, they do not satisfy Eq. 1 and the decoder 106 cannot exactly reconstruct X and Y. However, in the second example, Y is encoded as before, but X is encoded at a rate H2(X), where H(X|Y)<H1(X)<H2(X)<H(X). In this case, Eq. 1 is now satisfied and exact reconstruction is possible. Note that, although the explanation above concerned two correlated sources, the same principles still apply to a more general case in which J correlated sources (J>2) and encoders are used.
The concept of Slepian-Wolf coding has only recently inspired attempts to employ its principles for authentication purposes based on biometric data. For example, techniques have been proposed in which so-called syndromes codes (perhaps best known for their utility as part of channel coding or error correction schemes) are used during an enrollment phase as one-way transformations of biometric data. The syndromes (and hash values based thereon) are stored and subsequently used during an authentication process in which the syndrome is decoded using additional biometric data, received at the point of authentication, to produce an estimate of the original biometric data and/or the hash value. Thereafter, comparison of the hash values (or the authentication and reconstructed biometric data using conventional biometric comparison techniques) may be performed to authenticate the user. While this technique aspires to adhere to Slepian-Wolf principles, in fact, it does not employ the use of jointly compressed, distributed biometric data sources to the extent that the authentication biometric data is never compressed and is, instead, directly compared (or used to develop a hash value that is compared) at the point of authentication. In short, this technique does not perform joint decoding. Furthermore, these techniques fail to provide a realistic solution for the provision of cancelable biometric data. While the syndrome encoding does result in a one-way transformation of the biometric data, the complexity of generating such codes makes it unreasonable to expect that new syndrome codes could be applied every time transformed biometric data is compromised. Stated another way, syndrome coding is not readily scalable to providing a robust, cancelable biometric solution.
Thus, techniques for producing cancelable biometrics that overcome the limitations of prior art techniques would represent a significant advancement.