1. Field of the Invention
The present invention generally relates to computer systems and development. More specifically, the present invention relates to a process for reducing the time required to load a program for execution in a distributed or highly parallelized computer system.
2. Description of the Related Art
Supercomputer systems continue to be developed to tackle increasingly complex computing problems. These systems have proved to be particularly useful for a broad variety of applications including, life sciences, financial modeling, hydrodynamics, quantum chemistry, molecular dynamics, astronomy, weather modeling and prediction, and geological modeling. Super computer developers have focused on massively parallel computer architectures to provide ever-increasing amounts of computational power to apply to these, and other, applications.
One family of massively parallel systems has been (and continues to be) developed by International Business Machines (IBM) under the name blue gene. The blue gene/L system is a scalable that may be configured with a maximum of 65,536 (216) compute nodes. Each blue gene/L node includes a single application specific integrated circuit (ASIC) with 2 CPU's and memory. The blue gene architecture has been extremely successful and on Oct. 27, 2005, IBM announced that a blue gene/L system had reached an operational speed of 280.6 teraflops (280.6 trillion floating-point operations per second), making it the fastest computer in the world at the time. Further, as of June 2005, blue gene/L installations at various sites world-wide were among 5 out of the 10 top most powerful computers in the world.
IBM is currently developing a successor to the blue gene/L system, named blue gene/P. Blue gene/P is expected to be the first computer system to operate at a sustained 1 petaflops (1 quadrillion floating-point operations per second). Like the blue gene/L system, the blue gene/P system is a scalable system with a projected maximum of 73,728 compute nodes. Each blue gene/P node includes a single application specific integrated circuit (ASIC) with 4 CPU's and memory. A complete blue gene/P system would be housed in 72 racks or cabinets, each with 32 node boards (with 32 nodes per board).
In addition to the blue gene architecture developed by IBM, other distributed computer systems may have a similar overall architecture as a massively parallel computer system. Examples of other distributed systems include clustered systems and grid based systems. For example, a Beowulf cluster is a group of computer systems each running a Unix-like operating system, such as the Linux® or BSD operating systems. The computer systems of the collection are connected over high speed networks into a small TCP/IP LAN, and have libraries and programs installed which allow processing to be shared among the nodes.
In performing many of the applications described above, super computer systems are used to solve a variety of problems that often involve performing essentially the same calculations for different data sets. Examples of this type of application include modeling of molecular interactions such as simulating the folding of an individual protein. For these types of applications, a relatively small amount of data is used by a program executing on any given node. The program will then make many calculations involving this data. When finished, the results of the calculations are returned. Because thousands of nodes are performing the same calculations (on different data sets), extremely large datasets may be processed in a relatively short period of time.
Given the number of nodes in either a highly parallelized super computer, such as a blue gene system or in other distributed systems, operations that require even small amounts of overhead for any individual node often translate into large amounts of time for the system as a whole. For example, the collective time required to load a program by individual compute nodes can be significant. Thus, collectively, a substantial amount of time may be expended simply transmitting a program to a compute node. The same phenomenon occurs in distributed systems where datasets, programs and the like must be transmitted to processing nodes that are part of the distributed system. Accordingly, there is a need in the art for techniques that will reduce the load time of a program in highly-parallelized or distributed computer systems.