1. Field of the Invention
The present invention relates to a secure data storage and management method in consideration of data variability. The present invention also relates to a pattern recognition and data protection technique that converts data into a secure form so that the original data, which are used for registration, cannot be recovered or revealed when a system or database is hacked or compromised. In addition, the present invention relates to a pattern recognition and data protection method, where comparison between newly inputted data and registered data, or recognition newly inputted data with registered data, is performed in a transformed state in order to protect the original data which are used for registration.
Moreover, the present invention relates to a method of converting biometric information on a user into a secure form to protect the biometric information and privacy of the user and a method of recognizing/authenticating the user by using the converted biometric information so that original biometric information on the user cannot be exposed. In addition, the present invention relates to a data encryption and decryption technique for securely storing and releasing secret information by using biometric data that are unique to a person and can be used for user identification and authentication, but are changed for every acquisition even though acquired from the same person.
This work is supported by the IT R&D program of MIC/IITA [2007-S-020-01, Development of Privacy Enhanced Biometric System].
2. Description of the Related Art
Pattern recognition is applied in various fields in the modern society. General applications of the pattern recognition include user computer interface techniques such as voice recognition and face recognition, handwriting recognition techniques, automatic spam filtering techniques, web searching techniques, biometrics for user identification authentication, and the like.
In addition, as requirements for automatic analysis of massive data have increased, pattern recognition applications have extended to data mining techniques such as personal consumption pattern analysis for customized advertisement and automatic health check-up using a user's medical information.
In pattern recognition applications, generally, a template or a model that represents feature data or a data group is created and stored in a system. In addition, the system compares newly input data with the registered template and determines how similar the newly input data and the registered data are or whether or not the newly input data and the registered data are classified into the same class.
For example, it is assumed that a handwriting recognition system for recognizing and identifying the letter ‘A’ is provided. The system receives the letter ‘A’ via a user's input in advance, extracts unique features of the letter ‘A’, and generates and stores a template or a model for the letter ‘A’. Thereafter, when a user inputs an arbitrary letter, the system compares the arbitrary letter with the stored template or the model for the letter ‘A’ and calculates a similarity or a dissimilarity therebetween. Then, it is determined whether the similarity or the dissimilarity is larger or smaller than a predetermined value, that is, a threshold to determine whether or not the letter newly input by the user is the letter ‘A’.
In a case where the similarity is used as a comparison value, if the similarity is larger than the threshold, the letter input by the user is classified as the letter ‘A’, and if the similarity is smaller than the threshold, the input letter is classified as another letter excluding the letter ‘A’. In a case where the dissimilarity is used as the comparison value and Euclidean distances are used for determining the dissimilarity, if the comparison value is smaller than a threshold, the input word is classified as the letter ‘A’, and if the comparison value is larger than the threshold, the input word is classified as another letter excluding the letter ‘A’.
In summary, in the aforementioned method, a template or a model, which is regarded to belong to the same class as input data, or an input entity is designated, and the input data is compared with the designated template through one-to-one matching to determine whether or not the input data belongs to the same class as the template or the model.
Pattern recognition is also used for web-searching, i.e., a recognition technique through one-to-many comparison.
In the web-searching technique, information on many homepages is collected, and information on each homepage is summarized, and the summarized information is registered as a template or a model of a corresponding homepage in a database. When a web-searching user inputs a keyword, the web-searching system compares the keyword input by the user with templates or models for the registered homepages and displays a list of templates and models in order of similarity for the user.
In most of the pattern recognition application systems, irrespective of the one-to-one matching and the one-to-many matching approaches, input data is compared with a registered template as described above, and similarity or dissimilarity is calculated by using a comparison value. Specifically, even if pieces of data used in the pattern recognition techniques are acquired from the same person, the same entity, or the same device, the pieces of data do not show the same value but show slightly different values from each other for every data input and acquisition. Therefore, a comparison method of whether two pieces of data have exactly the same value cannot be used to determine whether or not the two pieces of data belong to the same class.
For example, in the handwriting recognition technique described above, even the same user cannot identically write the letter ‘A’ several times. For another example, when a fingerprint is recognized, data on the fingerprint becomes different according to a direction or a pressure of the fingerprint.
When the template or the model registered in the system using the pattern recognition techniques is leaked and abused, serious social and economical problems may occur. For example, it is assumed that a list of purchased goods of a consumer is stored in a database to obtain a consumption pattern of the consumer or for personalized advertisement. The stored data is related to a private consumption pattern of the user. Therefore, when the data is leaked, the consumer's privacy can be intruded. In addition, personal medical information stored for automatic health check-up is more privacy sensitive than the data related to the aforementioned consumption pattern.
A field to which general users may have easier access is the biometrics. A biometric system is a system for identifying a person by using physical/behavioral feature data about the person. Similarly to the general pattern recognition system which generates the template as described above, the biometric system generates and uses a template having physical/behavioral features and information on a user for user registration and identification.
The template registered and stored in the recognition system is referred to as a gallery, and a template that is newly generated from a user who requests authentication is referred to as a probe. When the user requests identification, the biometric system accesses and compares the gallery with the probe and classifies the user as a genuine or an impostor by using a result value of the comparison.
Since the biometric data has unique information on users, similarly to the aforementioned example, intrusion of privacy can occur if such data is disclosed or leaked.
Particularly, the biometric data is used as a kind of password for security. Therefore, in a case where a research institute or business uses a security apparatus using the biometric system, if the biometric information is leaked, in addition to the intrusion of privacy, the security of the research or business is threatened. In addition, the number of pieces of data used for user authentication is limited unlike the case of a general password, for example, a human has only one face and ten fingerprints. Thus, the loss or compromise of a user's biometric data can result in the permanent loss. The leakage of biometric data is more serious than that of other pattern recognition data.
Therefore, for data that is important or privacy sensitive such as biometric information, a method of encrypting and storing data so as that original information cannot be exposed has been suggested.
However, difficulties arise since it is impossible to obtain the same value from most of the pattern recognition data including the biometric data as described above while due to characteristics of a cryptographic function, very similar values are encrypted to completely different values.
Therefore, when data that is newly input is encrypted and compared with the data that is encrypted and registered in advance, a comparison value that is generated as a result of the comparison is not consistent with a comparison value obtained by comparing the input data that is not encrypted with original data. Therefore, instead of directly using the encrypted data for pattern recognition, the encrypted data has to be decrypted for comparison and recognition.
A method of protecting a registered template with encryption has a disadvantage in that the encrypted template has to be decrypted whenever data comparison is performed, and this results in security vulnerability.
As another method of protecting sensitive data such as a password, a method using a hash function has been proposed. In this method, in an authentication system generally using a password, the password of a user is not directly compared for authentication, but a hashed password is stored in advance, and the hashed password is compared with a hashed password input for authentication.
However, while it is possible for a user inputs the same password every time, the same value cannot be generated from pieces of data used for pattern recognition such as the biometric information even though the pieces of data are input from the same person, the same entity, or the same device, but quite different result values are generated from similar input values through the hash function. Therefore, pattern recognition using hashed templates is not accurate.
Difficulties of the existing encryption technique to protect the pattern recognition data due to data variability are associated with biometric-based key management combining the general encryption system and the biometric system.
In a general encryption-based user authentication and security system, the user is authenticated or data is encrypted by using a password or a private key of the user. It is well known that a long password or a long private key that is randomly generated has to be used to obtain high security. However, it is very difficult for the user to always remember the long password accurately.
In order to solve the aforementioned problem, by using a short and simple password that the user can easily remember, an original long and complex password or a private key is encrypted, and as needed, the original long and complex password or the private key is decoded to be used for a general encryption operation. However, since the security of the long and complex password depends on the short and simple password, the overall security is equivalent to using just a short and simple password.
As another method, an original long and complex password or a private key is stored in a personal storage device such as a smartcard, and the password is released to be used as needed. However, the smartcard can be lost.
Due to practical limitations of remembering and managing the long password or the private key for the user, a conventional authentication and encryption system is dependent on an apparatus having a low security level, and thus, the security level of the entire system decreases.
However, instead of using a password that the user may forget or a smartcard that can be lost, in order to effectively solve the user key management problem that occurs in the aforementioned existing encryption and security systems, biometric information may be used to manage and protect the password or the private key.
As a method of applying biometrics to the user key management of the existing encryption system, a biometric-based key release method has been proposed. The biometric-based key release method uses biometric information on the user to authenticate the user. In addition, according to a result of the authentication, when the user is identified as a genuine, security information such as a password or a private key of a corresponding user is released from a smartcard, a system, or a database.
The aforementioned method has an advantage in that the biometric method and the encryption system can be easily combined and implemented. However, since biometric information registered for user authentication and the security information such as the password or the private key are separated from each other to be stored in the system, a security problem may still occur. A hacker may directly steal only the password or the private key used for the encryption operation without attacking the biometric system and may exploit the password or the private key. Therefore, basically, the biometric-based key release method cannot protect the security information such as the password or the private key securely. In addition, similar to the existing biometric system, there is a problem in that the biometric information on users registered in the system can be exposed.
Therefore, as an ideal method, the long and complex password or the private key used for the general encryption operation is encrypted and stored by using the biometric information on the user as a cryptographic key. In addition, as needed, the password or the private key is decoded by using the biometric information on the user, and the decoded long and complex password or the personal key is used for the general encryption operation such as encryption/decryption.
However, although the biometric information is acquired from the same user or the same device, values of the acquired data are not fixed but different from each other every time. The hash function and the encryption/decryption techniques used in the existing encryption-based security system generate completely different output values from similar input values. Therefore, as described above, directly using the biometric information as the key in the existing encryption method is not good.