Currently in order to support data security at rest most secure storage systems use encryption. One of the key features of such a system is the ability to respond to compromised encryption keys. Most known systems implement such a functionality by walking through the storage system namespace, decrypting the contents using the old key and re-encrypting the data with the new key. Such an implementation is slow to respond to the security threat posed by a compromised key because it is a very long running process and takes a long time before the sensitive data is re-encrypted with the new key. If the new key also gets compromised in the middle of re-encryption process, it adds new implementation challenges for the storage system designers.
In a snapshot based system or a de-duplicated system, it is difficult to implement such a feature because same data blocks are shared amongst multiple entities (e.g. multiple files and/or snapshots). The file system has to keep track of all of these multiple keys and how they are mapped to individual data blocks. Implementing such functionality in a system that remains accessible throughout the re-encryption process only adds new challenges. Storage replication adds another dimension to the security of data when the key gets compromised, because the data could reside in multiple locations and possibly encrypted with the same compromised key.
Crypto shredding to do data sanitization has been utilized to prevent shredded data from being recovered and such a process is a slow operation for large storage systems and especially challenging to a deduplicated storage system. Currently, existing data sanitization techniques lack the ability to do instant sanitization. Techniques that do not use crypto shredding are inherently slower. Even crypto shredding requires frequent key rotation to a file system namespace while forgetting (deleting) the older keys. This itself is an expensive operation to decrypt and re-encrypt the entire file system namespace.
For deduplicated storage systems, one of the key problems is efficiently identifying unreferenced data blocks. In deduplicated systems, same data blocks can be shared amongst multiple entities. For the sake of efficiency, some of these systems do not do reference counting of the individual data blocks. This makes it difficult to determine all the data blocks that are still active in the storage system namespace. The problem becomes manifold as the storage system scale goes in a multiple-terabyte range. Storage systems that have snapshot/clone feature such that it can share blocks amongst multiple snapshots also suffer from similar complexities. It is difficult to design a storage system that can sanitize an individual file, directory, snapshot or a clone.