The invention relates to the field of computer science, and more specifically, to a computer-method implemented, a computer program, a data storage medium and a system for performing remote data storage.
Nowadays, more and more corporate and private users outsource their data to cloud storage providers. With the rapidly increasing amounts of data produced worldwide, networked and multi-user storage systems are becoming very popular, thanks to their accessibility and moderate cost.
In this context, various cost-effective storage optimization techniques are developed to save space, owing to the total sizes of data at stake. The effectiveness of storage efficiency functions, such as compression and deduplication, is an objective for both storage provider and customer: indeed, high compression and deduplication ratios allow optimal usage of the resources of the storage provider, and consequently, lower cost for its users.
Several deduplication schemes have been proposed by the research community, for example in the following papers:    Dirk Meister and André Brinkmann. Multi-level comparison of data deduplication in a backup scenario. In Proceedings of SYSTOR 2009: The Israeli Experimental Systems Conference, SYSTOR '09, pages 8:1-8:12, New York, N.Y., USA, 2009. ACM;    Nagapramod Mandagere, Pin Zhou, Mark A Smith, and Sandeep Uttamchandani. Demystifying data deduplication. In Proceedings of the ACM/IFIP/USENIX Middleware '08 Conference Companion, Companion '08, pages 12-17, New York, N.Y., USA, 2008. ACM; or    Lior Aronovich, Ron Asher, Eitan Bachmat, Haim Bitner, Michael Hirsch, and Shmuel T. Klein. The design of a similarity based deduplication system. In Proceedings of SYSTOR 2009: The Israeli Experimental Systems Conference, SYSTOR '09, pages 6:1-6:14, New York, N.Y., USA, 2009. ACM.
At the same time, recent data breach incidents make security an increasingly prominent requirement. Indeed, one obstacle still prevents many users from migrating data to remote storage: data security. The conventional means to address concerns over the loss of governance for outsourced data is to encrypt it before it leaves the premises of its owner.
While sound from a security perspective, this approach prevents the storage provider from applying any space- or bandwidth-saving functions, such as deduplication. On the other hand, most works related to deduplicating systems do not consider security as a concern.
Recently however, a paper has presented a number of attacks that can lead to data leakage in storage systems in which client-side deduplication is in place: D. Harnik, B. Pinkas, and A. Shulman-Peleg. Side channels in cloud services: Deduplication in cloud storage. Security Privacy, IEEE, 8(6):40-47, November-December 2010.
To thwart such attacks, the concept of proof of ownership has been introduced in the following papers:    Shai Halevi, Danny Harnik, Benny Pinkas, and Alexandra Shulman-Peleg. Proofs of ownership in remote storage systems. In Proceedings of the 18th ACM conference on Computer and communications security, CCS '11, pages 491-500, New York, N.Y., USA, 2011. ACM; and    Roberto Di Pietro and Alessandro Sorniotti. Boosting efficiency and security in proof of ownership for deduplication. In Proceedings of the 7th ACM Symposium on Information, Computer and Communications Security, ASIACCS '12, pages 81-82, New York, N.Y., USA, 2012. ACM.
None of these works, however, can provide real end-user confidentiality in presence of a malicious or honest-but-curious cloud provider.
Also known is a PoW scheme that allows client-side deduplication in a bounded leakage setting, as presented in the following paper: Jia Xu, Ee-Chien Chang, and Jianying Zhou. Leakage-resilient client-side deduplication of encrypted data in cloud storage. Cryptology ePrint Archive, Report 2011/538,2011. This scheme provides a security proof in a random oracle model for their solution, but this work does not address the problem of low min-entropy files.
Regarding encrypting data, convergent encryption is known as a cryptographic primitive, presented for example in two papers:    John R. Douceur, Atul Adya, William J. Bolosky, Dan Simon, and Marvin Theimer. Reclaiming space from duplicate files in a serverless distributed file system. In Proceedings of the 22nd International Conference on Distributed Computing Systems (ICDCS'02), ICDCS '02, starting from page 617, Washington, D.C., USA, 2002. IEEE Computer Society; and    Mark W. Storer, Kevin Greenan, Darrell D. E. Long, and Ethan L. Miller. Secure data deduplication. In Proceedings of the 4th ACM international workshop on Storage security and survivability, StorageSS '08, pages 1-10, New York, N.Y., USA, 2008. ACM.
Convergent encryption attempts to combine data confidentiality with the possibility of data deduplication. Convergent encryption of a message consists of encrypting the plaintext using a deterministic (symmetric) encryption scheme with a key which is deterministically derived solely from the plaintext. Clearly, when two users independently attempt to encrypt the same file, they will generate the same ciphertext which can be easily deduplicated. Unfortunately, convergent encryption does not provide semantic security as it is vulnerable to content-guessing attacks. Later, some known research formalized convergent encryption under the name message-locked encryption, as presented in the following paper: Mihir Bellare, Sriram Keelveedhi, and Thomas Ristenpart. Message-locked encryption and secure deduplication. Cryptology ePrint Archive, Report 2012/631, 2012. As expected, the security analysis presented in this work highlights that message-locked encryption offers confidentiality for unpredictable messages only, clearly failing to achieve semantic security.
In this context, there still is a need for an improved solution for performing remote data storage.