The evolution of computing and storage resources in the cloud are posing new challenges to data security. Various cloud storage services are being offered by vendors such as Amazon Web Services (AWS, a subsidiary of Amazon.com), Google and Microsoft. AWS for instance, offers a simple storage service (S3) and an elastic block store (EBS). Unlike a Hadoop Distributed File System (HDFS), which relies on local storage, these services rely on remote storage hosted in their respective data centers.
While in a locally operated HDFS, increasing storage space simply means either adding larger hard drives to existing nodes or adding more machines to the cluster, this exercise is usually costlier and is more complicated than a cloud storage. Furthermore, unlike HDFS, users cannot run code in a pure file storage service such as S3 where file operations are limited to just various flavors of Get, Put, Copy and Delete.
In general, the requirements of securing cloud storage services which are the focus of the present disclosure, do not apply to an Apache Hadoop “stack” or architecture and associating computing paradigms. Thus, the present technology focuses on securing data in cloud storage or simply the cloud. The vendors of such cloud storage services include AWS as noted above, as well as Google Cloud and Microsoft Azure.
Let us take the example of encrypting data in an S3 cloud. Most of the options provided by S3 involve sending data to S3 over the internet in plaintext and then encrypting it there. More precisely, the data is sent over Secure Socket Layer (SSL) in encrypted form. However, once it is received at the S3 cloud it is decrypted to its plaintext form before it is encrypted with S3 encryption for storage.
S3 encryption uses a per-object data key and a wrapping-key. The administrator manually specifies the wrapping-key to use. The wrapping-keys are stored/managed in AWS's own key manager or key management system (KMS) residing in the cloud, and can have access control lists (ACL's) defined on them. AWS's cloud-based KMS allows Amazon employees access to the keys (although it takes two cooperating employees to do so).
In a similar manner, AWS supports partition-based encryption for EBS. But it only works with AWS's KMS, which again allows Amazon access to the keys. Amazon S3 also supports cloud-based Hardware security modules (HSM's), employing SafeNet Luna HSM's. Objects in S3 are stored in buckets and identified by a key. Access Control Lists (ACL's) may be defined on both the buckets and individual objects. In comparison, EBS adds permanent virtual disk partitions to AWS's elastic compute cloud (EC2) instances. In Linux, they appear as raw disk devices, the same way as a disk partition does.
S3 can use authenticated encryption of Advanced Encryption Standard (AES) in Galois/Counter Mode (GCM) mode to detect if the ciphertext has been modified. As noted above, that while the communication of plaintext data to S3 is secured from eavesdropping attacks by virtue of SSL encryption over HTTPS/HTTP-S protocol, Amazon still has access to the plaintext data. Such a design where the cloud storage providers, such as AWS/Amazon, has access to customer's/client's plaintext data, or has access to its encryption keys does not meet the security requirements of many applications and organizations.
Therefore, Amazon/AWS S3 also provides a client-side encryption class in Java that encrypts data in client's network, before it is sent to the S3 cloud. Conversely, the encrypted data is retrieved from S3 and then decrypted at the client. However, this encryption class does not possess any key management functionality. It is left up to the code developers at the client side to also provide key management functions. This usually results in manual key management by the client, which is cumbersome and encourages using only a few keys, thus compromising security.
Alternatively, the client can use an off-the-shelf KMS, but for integrating with a client-side KMS, significant code development is required by the client. This is because AWS Software Development Kit (SDK) does not have the requisite interfaces/hooks for key management. As mentioned, AWS S3 provides its own KMS for key management which the client can interface to, but this option makes the keys available to AWS.
In addition to storing data in a cloud storage such as AWS S3 and AWS EBS, data processing itself can also be moved from the client environment/network to the cloud. An example of such a compute service is the AWS EC2 instances. However, processing in the cloud does have its own security implications as follows. While these scenarios are also possible when data processing is performed on-premise at the client, access to the client network is typically much more restricted than a publicly accessible cloud.
First, the file or data keys are vulnerable being present in the cloud in the memory and not saved to the disk. Similarly, a policy engine/server running in the cloud is also vulnerable to attack. Since it receives wrapping-keys from the key manager(s) or key management interoperability protocol (KMIP) server(s), a compromise of a wrapping-key in a policy engine could get an attacker the ability to decrypt a large number of file-keys.
Additionally, the private key for the digital certificate used to authenticate a policy engine to a key manager, if stored in the cloud such as an EC2 instance, is also vulnerable. It can be attacked or (theoretically) obtained by Amazon. The key manager(s) are also vulnerable to attack in the cloud since an attacker may be able to read decrypted keys from the cloud or access the master keys stored in a cloud-based HSM to decrypt keys stored in the key database(s) of the key manager(s). As noted above, these scenarios are also theoretically possible when data processing is performed on-premise in the client's network but access to that network is typically more restricted and only available to trusted users.
There are many prior art teachings that address protecting data in a cloud storage. U.S. Patent Publication No. 2012/0278622 A1 to Lesavich discloses a method and system for electronic content storage and retrieval with Galois Fields on cloud computing networks. The electronic content is divided into many portions stored in many cloud storage objects. Storage locations for the many cloud storage objects are selected using a Galois field and the many cloud storage objects are distributed across the cloud network. When the electronic content is requested, the many portions are retrieved and transparently combined back into the original electronic content. No server network devices or target network devices can individually determine the locations of all portions of the electronic content on the cloud communications network, thereby providing layers of security and privacy for the electronic content on the cloud communications network.
U.S. Patent Publication No. 2012/0134491 A1 to Liu teaches a cloud storage security technology for encrypting the data partially. First, a size H of a random seed is calculated based on the amount of data X that is expected to be stored within some preset time, a proportion of local storage space R and the security level of data Z. Then, based on the amount Y of plaintext data each time, a data acquisition times u is calculated. Then, based on the times u, data in the size of H is acquired several times to generate a plaintext encryption bit identifier data string. Then, using the data string, more than one half of the plaintext data is selected for encryption to ciphertext. The teachings purportedly reduce the amount of encrypted data to be stored without sacrificing the degree of data security protection, thus improving cloud encryption/decryption performance.
U.S. Patent Publication No. 2014/0122866 A1 to Haeger discloses a proxy that receives a file to be stored by a cloud storage server, from a client node. The proxy and the client node are parts of a private network that does not include the cloud storage server. The proxy retrieves an encryption key associated with a user of the client node and encrypts the file using the encryption key. The proxy then transmits the encrypted file to the cloud storage server.
Besides above prior art teachings in the patent literature, other industry products that provide encryption support for S3 are Safenet ProtectApp, SafeNet ProtectV and CloudBerry Explorer Pro.
What is a absent from the prior art is a comprehensive cloud security management system having the following features:                Allow a customer to keep sole control of its data as well as sole access to its encryption keys.        Has automated key management.        Should support other key management interoperability protocol (KMIP) compliant key managers/KMS's.        Encrypt data before sending from the customer/client network to the cloud, and decrypt it after retrieving if from the cloud.        Encrypt the data so that its integrity is protected (authenticated encryption).        Have a centralized security policy defined by the administrator, that determines what data is encrypted, and specifies the granularity of the wrapping-keys.        Support pure cloud solutions (such as S3).        Support multiple cloud storage services through their corresponding application programming interfaces (API's) and require little or no modification to the existing client code.        Support files larger than 64 GB.        Encrypt files in a range of blocks or bytes to allow for efficient reads of a part of an encrypted file without decrypting the whole file.        
The above-mentioned benefits, absent from the prior art, would benefit organizations utilizing commercial cloud data storage, and who want to encrypt their data while controlling the keys. This is because such organizations do not trust the cloud storage vendors to secure their data. They may also have regulatory requirements that require them to control their keys. They may also want to be able to switch cloud storage vendors or use multiple cloud data storage vendors, without having to implement vendor-specific encryption for each one.