(a) Field of the Invention
The present invention relates to a method and an apparatus for deduplicating encrypted data.
(b) Description of the Related Art
A data deduplication technique is a technique which does not repeatedly store all the same data but removes duplicated parts. This technique is very important because a storage space of a database (DB) or a cloud system is saved and redundant data is not transmitted in a network, thereby reducing a bandwidth.
A deduplication technique of encrypted data is a technique which encrypts data to provide privacy of data which is stored in a remote data storage such as a DB system or a cloud service, thereby storing cipher text data rather than plain text data and removing duplicated cipher text data. According to an encryption method of the related art, people who want to encrypt a plain text use a unique encryption key for a cipher text corresponding to the same plain text, so that a plurality of different cipher texts for one plain text is created. Therefore, plain text data can be deduplicated but encrypted data corresponding to the same plain text data is different, so that there is no advantage in that a storage space is saved by deduplication.
In order to deduplicate encrypted data, a paper entitled “Reclaiming Space from Duplicate Files in a Serverless Distributed File System” was presented on 2002 and a concept of convergent encryption (CE) was suggested for the first time. In the CE, an encryption key is derived by a deterministic function such as a value obtained by hashing a message and a plain text message are encrypted under the encryption key by a block encryption algorithm to create a cipher text. As a result, when different users have the same plain text and encrypt the plain text with CE, only the same encryption key is created for every user. That is, even though the plain text data is possessed by different users, only one cipher text data corresponding to the plain text data is created, so that the encrypted data may be deduplicated. However, the paper mentioned only a basic concept of the CE but did not describe specific configuration and operation for deduplicating encrypted data between a client who requests to encrypt and store the data and a server (or a DB) which stores the data.
Further, a paper regarding message based encryption and safe deduplication entitled “Message-Locked Encryption and Secure Deduplication” was presented in order to deduplicate encrypted data, based on the above paper. This paper suggested four encrypted data deduplication techniques (CE, HCE1, HCE2, RCE) and has the following features and problems.
According to the CE technique of the related art entitled “Message-Locked Encryption and Secure Deduplication”, the client hashes the plain text data to create an encryption key and encrypts the plain text data under the encryption key with a block encryption algorithm to create encrypted data. Thereafter, the client transmits the encrypted data to the server. The server which receives the encryption data from the client uses a value obtained by hashing the encrypted data as a tag in order to identify the received encrypted data. The server compares the tag calculated by the server with a tag list which is stored in the storage of the server and stores the received encrypted data and the tag only when there is no matching tag. When there is a tag in the tag list stored in the storage of the server which matches the tag calculated by the server, it means that the data which is stored in the storage of the server and the transmitted encrypted data are duplicated. Therefore, the server does not store the encrypted data which is transmitted from the client. As a result, a deduplication process for the same encrypted data is enabled. According to the CE technique, the client unconditionally transmits the encrypted data to the server and the server determines whether to deduplicate the encrypted data, which causes the network bandwidth to be significantly wasted. Further, the server has a burden of calculating tags for all transmitted encrypted data. Particularly, when a data size is very large, like motion picture data, a burden of calculating the tag is significant.
According to the HCE1 technique of “Message-Locked Encryption and Secure Deduplication” of the related art, the client hashes the plain text data to be used as an encryption key and hashes the encryption key again to create a tag. When a tag for identifying the encrypted data is created, differently from the CE technique, according to the HCE1 technique, a hash value for the encrypted data is not used as a tag but a hash value for the encryption key is used as a tag. Thereafter, the client transmits the encrypted data and the tag to the server. When there is a tag in the tag list, which is stored in a database of the server, which matches the received tag, the server which receives the encrypted data and the tag does not store the received encrypted data. When there is no matching tag, the server stores the received encrypted data and the tag. According to this manner, the server may perform deduplication process on the encrypted data. However, as mentioned in the paper, the HCE1 technique is vulnerable to a duplicate faking attack and a data erasure attack so that the HCE1 technique is inappropriate to be used in an actual environment.
Further, according to the HCE2 technique of “Message-Locked Encryption and Secure Deduplication” of the related art, an uploading process of the HCE2 is the same as an operating method of the HCE1 but a downloading process is different from that of the HCE1. Similarly to the HCE1, the client transmits a tag which is stored in the client to the server to download the encrypted data from the server and the server transmits the encrypted data which matches the received tag, to the client. Thereafter, according to the HCE1 technique, the client decodes the plain text data using the received encrypted data and the encryption key which is stored in the client to end the operation. However, according to the HCE2 technique, after decoding the plain text data, the client calculates a hash value for the decoded plain text data and hashes the hash value again to create a new tag. The client compares the stored tag with the newly created tag and then when two tags match, stores the plain text data for the received encrypted data and ends the operation. When two tags do not match, the client deletes all data related with the received encrypted data and ends the operation. According to the technique, the duplicate faking attack which is mentioned in the HCE1 may be prevented but the technique is still vulnerable to the data erasure attack, so that the technique is inappropriate to be used in an actual environment. For example, it is assumed that the client stores its own encrypted data in the cloud server and the client deletes the data. There may be a serious problem in that when the client tries to download its own encrypted data later, the data disappears due to an erasure attack by a malicious attacker so that the data may not be permanently recovered.
Further according to the RCE technique of “Message-Locked Encryption and Secure Deduplication” of the related art, the client may have another private secret key which is unique for every client in addition to the encryption key which is derived from the message. In the RCE technique, the operation method is the same as the HCE2 technique except that the client creates two encrypted data using the private secret key to upload the data to the server and inspects tag consistency using two encrypted data during the download process. However, similarly to the HCE2 technique, the RCE technique is vulnerable to the data erasure attack.
As examined in the above methods, the CE and the message-locked encryption (MLE) based encrypted data duplication processing methods have the following problems. In the CE based encrypted data deduplication processing technique, all encrypted data is transmitted from the client to the server, so that the network bandwidth is significantly wasted and the server needs to calculate tags for all encrypted data. Therefore, the calculating burden is very large. Further, the encrypted data duplication processing technique by the HCE1 method is vulnerable to both the duplicate faking attack and the data erasure attack, so that the technique is inappropriate to be used in the actual environment. Further, the HCE2 and RCE based encrypted data duplication processing methods are also vulnerable to the data erasure attack, so that the data of the client is deleted from the server. Furthermore, the methods also mention only a data encryption and decryption method for the deduplication process of the encrypted data and a tag creating method but do not clearly describe a specific operation of a deduplication process of the encrypted data in the client and the server.
The above information disclosed in this Background section is only for enhancement of understanding of the background of the invention and therefore it may contain information that does not form the prior art that is already known in this country to a person of ordinary skill in the art.