The present invention relates in general to parallel processing and in particular to methods for scalably exploiting parallelism in a parallel processing system.
“Parallel processing” refers to the ability of computer systems to execute two or more operations at the same time. Numerous systems possessing varying kinds and degrees of parallel processing capability (or parallelism) have been developed over the years. These include MIMD systems that are capable of executing multiple different instructions in parallel on multiple input data values, as well as SIMD systems that execute the same instruction on multiple input data values in parallel.
Conventionally, exploiting parallelism in a computing system requires that the programmer or compiler be aware of the available parallelism. In one programming model, the programmer (or compiler) knows what parallel processing capability a particular system has and creates code that explicitly distributes the work across the parallel processing hardware. For instance, in a system with two processing cores, the program code would include explicit instructions to spawn new processes or threads and to assign processes or threads to specific processing cores (which may be in the same processor or different processors). Such instructions can be inserted by the programmer or by a compiler based on configuration information for a particular system.
Code generated in this manner is not scalable, meaning that it is not readily transportable to other systems with different degrees or kinds of parallelism. For instance, code specifically written (or compiled) for a single core processor can be executed on a dual-core processor, but the code will use only one of the cores, resulting in inefficiency to the extent that the code includes tasks that could be done in parallel. To exploit the parallelism provided by the second core, the code would have to be rewritten (or at least recompiled) for a dual-core system. Similarly, code specific to a two-core system would have to be rewritten and/or recompiled to exploit the higher degree of parallelism provided in a four-core system, and so on. Scaling in the other direction is also problematic, as code written and compiled for a system with a number C of cores will generally not be executable on a system with fewer than C cores; such code would need to be rewritten and/or recompiled in order to execute at all.
A more scalable model is sometimes used in server farms, where incoming processing tasks are distributed among multiple servers based on server availability. In some farms, there is a centralized work manager that automatically directs each incoming task to one or another of the servers, which executes the task. The work manager must be programmed with information about the number and capacity of the various servers, but this information does not need to be in the program code that defines the tasks to be performed. Further, the task request need not specify a particular server; thus, the programmer or process that is the source of processing tasks need not be aware of the number of servers in the farm.
Within each processing task, however, the scalability problem persists. Any parallelism that might be present in a particular server is exploited only to the extent that the code associated with the processing task explicitly distributes the work. Thus, the code must still be programmed and/or compiled for a specific parallel processing configuration and must be rewritten or recompiled to obtain maximum efficiency in a different configuration.
It would therefore be desirable to provide techniques for scalably exploiting parallelism in a parallel processing subsystem.