Batch processing may refer to the processing of data without interaction or interruption. Once started, a batch process runs to some form of completion without any user intervention. Batch processing has challenges such as usability, which pertains to error handling and maintainability code. Another challenge in batch processing is scalability for a batch job because it is often one or more orders of magnitude larger than that of a typical web or thick-client application. Another challenge in batch processing is availability because batch jobs typically are not 24/7.
Batch processing can be made more efficient by the use of cloud computing to offload company servers or distribution of the computation. Both solutions, however, are risky from the security perspective. For example, with cloud computing, the organization exposes data to the cloud provider. Accordingly, the cloud provider may read this data and be privy to information that the organization considers confidential. Moreover, by distributing the computation, the data may be spread across different servers and possibly datacenters. The spread of the data across different servers and/or datacenters may pose many security concerns.
To achieve secured batch processing in the cloud environment, a conventional approach resorts to the so called “hybrid cloud.” The hybrid cloud is a cloud solution that combines physical servers that are on premises of the organization with physical servers that are located in the cloud provider's datacenter. With this conventional approach, the sensitive computations are run only on physical servers on premise. While such a solution is very effective to achieve security, it lacks flexibility. For example, at certain points it may be the case that most computations use sensitive data to some extent. In this case, the on-premise servers may become overloaded and cloud resources underutilized, thus diminishing any value of having these resources at hand.
To achieve secured batch processing in a distributed environment, no effective techniques exist so far. For example, with the conventional approach discussed above, the nodes of the distributed cluster are typically placed behind a firewall of the organization's Intranet and carefully secured. Placing the nodes of the distributed cluster behind the organization's firewall may require the organization to task the best administrators to take care of each and every such server and promotes more homogeneity of the computing environment, which is simpler to manage.