1. Field of the Invention
The present invention generally relates to parallel computing. More specifically, the present invention relates to managing blocks of computing resources on a clustered or distributed computing system.
2. Description of the Related Art
Powerful computers may be designed as highly parallel systems where the processing activity of hundreds, if not thousands, of processors (CPUs) are coordinated to perform computing tasks. These systems are highly useful for a broad variety of applications including financial modeling, hydrodynamics, quantum chemistry, astronomy, weather modeling and prediction, geological modeling, prime number factoring, and image processing (e.g., CGI animations and rendering), to name but a few examples.
For example, one family of parallel computing systems has been (and continues to be) developed by International Business Machines (IBM) under the name BLUE GENE® system. The BLUE GENE®/L architecture provides a scalable, parallel computer that may be configured with a maximum of 65,536 (216) compute nodes. Each compute node includes a single application-specific integrated circuit (ASIC) with 2 CPUs and memory. The BLUE GENE®/L architecture has been successful and on Oct. 27, 2005, IBM announced that a BLUE GENE®/L system had reached an operational speed of 280.6 teraflops (280.6 trillion floating-point operations per second), making it the fastest computer in the world at that time. Further, as of June 2005, BLUE GENE®/L system installations at various sites world-wide were among five out of the ten top most powerful computers in the world.
IBM is currently developing a successor to the BLUE GENE®/L system, named BLUE GENE®/P system. BLUE GENE®/P system is expected to be the first computer system to operate at a sustained 1 petaflops (1 quadrillion floating-point operations per second). Like the BLUE GENE®/L system, the BLUE GENE®/P system is scalable allowing for configurations to include different number of racks.
In addition to the BLUE GENE® architecture developed by IBM, other highly parallel computer systems have been (and are being) developed. For example, a Beowulf cluster may be built from a collection of commodity off-the-shelf personal computers. In a Beowulf cluster, individual systems are connected using local area network technology (e.g., Ethernet) and system software is used to execute programs written for parallel processing on the cluster of individual systems. Another approach to parallel computing includes large distributed or grid-type computing systems which pool the computing power of hardware spread over a widely spread locations.
In these parallel systems, it is possible for one user to use only a subset of the total available hardware resources. Collectively, the resources assigned to carry out a particular computing job are usually referred to as a “block.” In a BLUE GENE® system, for example, a “block” refers to a group of compute nodes assigned as a unit to perform a particular computing task. Given the number of nodes in most parallel systems, multiple jobs may be executed simultaneously on different blocks. Blocks can take on a wide variety of sizes and each block can have access to unique resources (e.g., a different amount of memory, different communications network, or access to different file system) based on the hardware and the location of the physical resources.
In some cases, the physical location of a block may have an effect on the system as a whole. That is, the fact that a given block is using a subset of hardware in a given physical location can, and often does, affect the usability of the remaining physical hardware at that location or the system as a whole. In some cases, this is due to the way that distributed and clustered computing systems are configured to communicate between the nodes.
More generally, a block using one set of resources may have an effect on the system, and other users, as a whole. For example, computing blocks may be allocated in a way that the resources being used prevent enough physical resources from being available to meet the demands of other users on the system. That is, the compute nodes of the parallel system can become fragmented by multiple jobs running at different locations on the system. For example, a distributed or clustered computing system may become fragmented when a number of small jobs are assigned to blocks in a manner that prevents a large contiguous block of resources from being utilized. Depending on how the blocks are assigned, and how long computing jobs need to execute, sometimes a job that requires a large block of compute nodes may have to wait to be assigned a block, even though the system as a whole may have the needed number of compute nodes available. Further, if new smaller jobs continue to be submitted for execution, then the larger job may starve while the smaller jobs are executed, further fragmenting the resources of the parallel system.
Accordingly, there remains a need for techniques for defragmenting blocks in a clustered or distributed computing system.