1. Field of the Invention
Embodiments of the invention relate to improving the efficiency of multi-node computing systems. More specifically, embodiments of the invention may be configured to improve performance on a multi-node computing system by selectively compiling source code to native instructions among compute nodes of such a system.
2. Description of the Related Art
Powerful computers may be designed as highly parallel systems where the processing activity of thousands of processors (CPUs) is coordinated to perform computing tasks. These systems are highly useful for a broad variety of applications, including financial modeling, hydrodynamics, quantum chemistry, astronomy, weather modeling and prediction, geological modeling, prime number factoring, and image processing (e.g., CGI animations and rendering), to name but a few examples.
For example, one family of parallel computing systems has been (and continues to be) developed by International Business Machines (IBM) under the name Blue Gene®. The Blue Gene/L architecture provides a scalable, parallel computer that may be configured with a maximum of 65,536 (216) compute nodes. Each compute node includes a single application specific integrated circuit (ASIC) with 2 CPU's and memory. The Blue Gene/L architecture has been successful and on Oct. 27, 2005, IBM announced that a Blue Gene/L system had reached an operational speed of 280.6 teraflops (280.6 trillion floating-point operations per second), making it the fastest computer in the world at that time. Further, as of June 2005, Blue Gene/L installations at various sites world-wide were among five out of the ten top most powerful computers in the world.
The compute nodes in a parallel system typically communicate with one another over multiple communication networks. For example, the compute nodes of a Blue Gene/L system are interconnected using five specialized networks. The primary communication strategy for the Blue Gene/L system is message passing over a torus network (i.e., a set of point-to-point links between pairs of nodes). The torus network allows application programs developed for parallel processing systems to use high level interfaces such as Message Passing Interface (MPI) and Aggregate Remote Memory Copy Interface (ARMCI) to perform computing tasks and distribute data among a set of compute nodes. Of course, other message passing interfaces have been (and are being) developed. Additionally, the Blue Gene/L includes both a collective network and a global interrupt network. Further, certain nodes are also connected to a gigabit Ethernet. These nodes are typically used to perform I/O operations between the Blue Gene core and an external entity such as a file server. Other massively parallel architectures also use multiple, independent networks to connect compute nodes to one another.
Massively parallel systems such as the Blue Gene architecture were originally designed to support a SIMD (Single Instruction Multiple Data) programming paradigm. This typically involves running one large scale tightly coupled MPI-based application across all of the compute nodes in a partition. In comparison to other available packaging strategies, the Blue Gene packaging produces many teraflops per rack, has a large memory footprint, and low power consumption. This also makes the Blue Gene architecture attractive for a High Throughput Computing (HTC) model. HTC provides a computing model that allows for independent work units on each node. A launcher program resides on each compute node of a massively parallel system. The launcher program listens for work-requests from a scheduler, performs the request, and restarts. In such a case, each node in the system executes the same program, but may execute different portions of the program, depending on the actual work request taken up by a node. The scheduler is generally an external program transferring work requests to the launcher collective.