Users can outsource hosting of applications and other services to cloud service providers, e.g. Amazon®, Rackspace®, Microsoft® etc. More specifically, applications can be run on virtual machine instances in the cloud as part of outsourcing hosting of applications and other services to cloud service provider. In cases where data-intensive jobs are outsourced to the cloud, jobs are typically performed on clusters of virtual machines instances, often times in parallel. A wide variety of different virtual machine instance types are available for hosting applications and other services in the cloud. In order to outsource hosting of applications and other services, including data-intensive jobs, a user has to select virtual machine instance types to perform jobs. Additionally, in order to outsource jobs to the cloud, a user has to select a number of nodes or virtual machine instances to add to a cluster of virtual machine instances in order to perform the jobs. Costs of using the different types of virtual machine instances vary based on the instance type and the number of virtual machine instances used. Accordingly, a cost of outsourcing a job in the cloud is a function of both a number of virtual machine instances used and types of virtual machine instances used, e.g. as part of a cluster configuration.
Currently, users can choose virtual machine types by arbitrarily selecting machine types or by using previous experiences of outsourcing similar jobs to the cloud. This is problematic because users might define cluster configurations unsuitable for performing a specific job. For example, a user might select more expensive virtual machine instance types to perform a job while less expensive virtual machine instance types could have just as effectively performed the job. There therefore exists a need for automating cluster configuration selection for outsourced jobs in order to minimize usage costs.
Further, outsourced jobs typically need to be completed within a specific amount of time, e.g. a service level objective deadline has to be met. In order to ensure service level objective deadlines are met, users typically scale out by adding virtual machine instances to a cluster. This is often done irrespective of the actual cost to scaling out and whether the scaling out is actually needed to perform the job by the service level objective deadline. There therefore exists a need for automating cluster configuration selection for jobs outsourced to the cloud in order to minimize usage costs while ensuring the service level objectives for the jobs are still met, e.g. a cost-optimal cluster configuration.