High performance computing allows a customer to use compute resources in order to accomplish a job. Typically, the customer will engage in a software license agreement with a provider of compute resources, which often obliges the compute resources provider to provide a certain level of service using the compute resources. Often, a provider of compute resources will have multiple customers (i.e., multiple tenants), each having one or more software license agreements.
As job requests are received by the provider, a scheduler schedules the job requests so that they are accomplished in a manner that satisfies all of the software license agreements. Failure to do so can often result in a breach of the software license agreement, resulting in loss of good will, and monetary loss for the provider. Accordingly, the provider ensures that all of the machines providing the compute resources are properly functioning, and that there is sufficient redundancy to handle failure scenarios. Often, the compute resources are thus physically located in an area in which they can be maintained by the provider.
Moving the compute resources to a cloud computing environment presents significant challenges. A cloud computing environment can often guaranty a certain number of machines to provide the compute resources, but cannot often guaranty that the same machine will be fully operational for the entire lifetime of the job, or that proper network connectivity will be maintained for the entire lifetime. Thus, the stability of the virtual machines in the cloud computing environment may not be as high.