1. Technical Field
This disclosure generally relates to high performance computing (HPC) systems, and more specifically relates to dynamic job relocation of a job executing on a plurality of nodes in an HPC system.
2. Background Art
High performance computing systems, sometimes referred to as supercomputers, continue to be developed to tackle sophisticated computing jobs. These computers are particularly useful to scientists for high performance computing (HPC) applications including life sciences, financial modeling, hydrodynamics, quantum chemistry, molecular dynamics, astronomy and space research and climate modeling. Supercomputer developers have focused on multi-node computers with massively parallel computer structures to solve this need for increasingly complex computing needs. The Blue Gene architecture is a massively parallel, multi-node computer system architecture developed by International Business Machines Corporation (IBM). References herein are directed to the Blue Gene/L system, which is a scalable system with 65,536 or more compute nodes. Each node consists of a single ASIC (application specific integrated circuit) and memory. Each node typically has 512 megabytes of local memory. The full computer is housed in 64 racks or cabinets with 32 node boards in each. Each node board has 32 processors and the associated memory for each processor. As used herein, a massively parallel computer system is a system with more than about 10,000 processor nodes.
Massively parallel computer systems like Blue Gene are expensive and thus their utilization or throughput needs to be maximized get the greatest possible amount of work through the system each hour. Typically there are jobs of varying size and runtime that need to be scheduled on the system. The job allocation needs to be properly managed to achieve the correct balance of throughput and response time. The response time to execute a particular job may suffer when maximizing the overall throughput such that some users don't get a responsive system. With many prior art methods for job allocation, the available system partitions or contiguous node blocks can become sparse and small system partition gaps between jobs can occur such that there is insufficient contiguous space to load a new job.
Techniques have been developed to defragment the blocks of resources so that more contiguous physical resources are available for a new job to begin execution. Jobs can sometimes be relocated to improve job allocation and free up contiguous space. However, the majority of applications or jobs that execute on a HPC system involve message passing between nodes, thus they cannot simply be suspended and relocated at any time without losing data in transit between the nodes.
This disclosure is directed to dynamically relocating a job executing on an HPC system, and in particular where the job includes message passing between nodes. Dynamic relocation can be used to defragment blocks of nodes to achieve better system optimization.