The embodiments presented herein generally relate to improving the performance of High Performance Computing (HPC) applications on the Cloud by integrating an application and cloud level load balancing.
Effective optimization of the load assignment on the Cloud needs to take into account the High Performance Computing (HPC) application task requirements as well as the computational capacity and communication bandwidth of the Cloud resources. This disclosure proposes an approach for two-way transfer of the essential information between Cloud and HPC applications that result in better load assignment without violating network privacy.
HPC applications are mostly scientific applications, (e.g. partial differential equation computations, computational fluid dynamics) which can be run on massively parallel architecture. An HPC application consists of a number of tasks, where each task performs some computation, and different tasks communicate. Normally, there are a greater number of tasks than the number of available compute nodes to perform such tasks. The tasks need to be mapped on to processors in the underlying parallel architecture such that processing on every processor is balanced, and communication between different processors is minimized.
HPC on dedicated clusters have known computational capacity and communication bandwidth for the individual resources. Applications can themselves optimize load assignment in an effective fashion in this case. However, the drawbacks of dedicated clusters have significant delays waiting for the cluster to be available for exclusive use, one cannot add or remove resources in case of an increased or decreased demand, and dedicated clusters have a fixed configuration that might not be optimal for applications with varying computation or communication patterns.
HPC on the Cloud (cloud computing) allows for dynamically changing heterogeneous computational capacity and communication bandwidth for the resources.
Cloud computing is a model for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction. (“NIST.gov—Computer Security Division—Computer Security Resource Center”. Csrc.nist.gov.) Cloud computing provides computation, software, data access, and storage services that do not require end-user knowledge of the physical location and configuration of the system that delivers the services. Parallels to this concept can be drawn with the electricity grid, where end-users consume power without needing to understand the component devices or infra-structure required to provide the service. Cloud computing describes a new supplement, consumption, and delivery model for IT services based on Internet protocols, and it typically involves provisioning of dynamically scalable and often virtualized resources. It is a byproduct and consequence of the ease-of-access to remote computing sites provided by the Internet. This frequently takes the form of web-based tools or applications that users can access and use through a web browser as if they were programs installed locally on their own computers. (See <<en.wikipedia.org/wiki/Cloud_computing#Technical_description>>.)
Typical cloud computing providers deliver common business applications online that are accessed from another Web service or software like a Web browser, while the software and data are stored on servers. Most cloud computing infrastructures consist of services delivered through common centers and built-on servers. Clouds often appear as single points of access for consumers' computing needs.
Typically cloud load balancing is performed by monitoring current resource usage across the applications. The application performs load balancing on its own with available resources, over provisioning in cloud. Thus, load balancing is performed at two different levels independently, (the cloud and the application), and hence is sub-optimal.