Technical Field
The present invention relates generally to cloud computing and, in particular, to enabling secure Big Data analytics in the cloud.
Description of the Related Art
One of the main challenges for providing Big Data Analytics-as-a-Service in the public Cloud is security of the enterprise data that is transferred and stored in the public cloud storage facilities and used for further analytics processing using Big Data platforms such as Hadoop®. This is due to the fact that while the enterprise data storage environment can be trusted for storing sensitive enterprise data, once this data is transferred to the cloud it is exposed to the system administrators of the public cloud service provider that can access the data at any time. While the data can be secured before being uploaded to the public cloud, this complicates further analytics processing that the data might undergo in the cloud using Analytics-as-as-Service since analytics algorithms running on the Hadoop® platform cannot process encrypted/secured data. We note that such Analytics-as-a-Service can involve the deployment and provisioning of analytics processing platforms such as Hadoop® on physical/virtual machines in the cloud.
Current solutions to this problem either: (approach a) assume that the enterprise fully trusts the public cloud service provider to not access its data once the data is uploaded there in clear text, which might not be acceptable if the data is sensitive (e.g., due to regulatory and/or legal reasons and/or so forth); or (approach b) attempt to fully encrypt the underlying file system of the physical/virtual machines onto which the Big Data platform is deployed. The latter approach (approach b) has certain drawbacks, including: (1) inflexible separation between secured and unsecured enterprise data that might be needed for analytics processing, which lacks fine-grained control over which files are secured and which are not secured; (2) requiring configuration changes of the underlying operating system onto which the Big Data analytics platform is deployed, which might create additional administrative and management costs for the public cloud service provider; and (3) not allowing for fine-grain control over the security algorithms and parameters used on a per data file basis. The former approach (approach a) has certain drawbacks, including: (1) non-compliance with regulatory requirements (e.g., Federal Financial Institutions Examination Council (FFIEC), Health Insurance Portability and Accountability Act (HIPPA), etc.) that might dictate that certain data of sensitive nature (e.g., financial records, health records, etc.) cannot be stored in clear-text in public service cloud providers; (2) public cloud service providers may have to respond to government subpoenas and other legal actions that could force them to hand over data to authorities that the original owner of the data might not want to reveal (e.g., a company based out of one jurisdiction such as Germany might not want to hand over sensitive data to the U.S. entities if the cloud service provider is based out of the U.S.); (3) it is not always possible to guarantee safeguards on data that resides with public cloud service providers, in order to consequently trust that the enterprise data will be handled with all appropriate care and procedures that the enterprise itself follows.
Thus, there is a need for enabling Hadoop®-based (or other bulk synchronous processing system-based) Big Data analytics in the cloud without trusting the data storage facilities of the public cloud service provider (i.e., provide data-at-rest security).