Since the advent of computing systems, a prominent goal has been to speed up program execution. On uniprocessing systems, the preferred way has been to design faster electronic components to speed electrical signals through the system. For such systems, theoretical limits, such as the speed of light, place an upper bound on the speed of program execution. To extend beyond such limitations, multiprocessing of user programs takes advantage of the processing power of several central processing units (CPUs) concurrently executing a single program.
Multiprocessing is possible when some sections of program code are independent from other sections in their order of execution. In that case, multiple CPUs may execute these sections of code concurrently. Ideally, a program would run N times faster if N CPUs were simultaneously executing the program code. However, this best case is not possible for a variety of theoretical and practical reasons.
Theoretically, a program will not achieve this maximal speed-up if there are sections of code that must be executed in a non-trivial partial ordering--e.g. some sections of code must wait for other sections to finish executing in order to use their results. These data dependencies dictate how much a given program can be sped up by having more than one CPU executing its code concurrently. They also indicate that a given program may go through sections of code that require at most one CPU and other sections where multiple CPUs may be used. This transition between single CPU mode and multiple CPU mode for a given program creates practical reasons why programs, in general, do not achieve their maximal speed-up.
One practical reason involves communication between the user program and the host Operating System. In a multiprogramming environment (i.e. several user programs simultaneously compete for computing resources) it is usually inefficient to allow one program to keep multiple CPUs attached to it when a program is in single CPU mode. These idle CPUs might better serve system throughput by working on other programs until that program returns to multiple CPU mode. When the program does, however, return to multiple CPU mode, it needs to request additional CPUs from the Operating System. In the prior art, the user program and the Operating System have historically had two avenues for communication: the system call, which is a request by the program for services, and the interrupt, which a mechanism by which the Operating System reports certain information to the user program. Neither mechanism the high-speed communication needed for efficient multi-tasking in a multiprogramming environment. Without a method of high speed communication, a computing system is far from achieving either maximal speed-up or efficient system throughput.
A problem occurs when user programs request additional CPUs to process sections of code that are sufficiently small. In the worse case, the time it takes for a single CPU to execute the entire section of code might be the same or less than the time it takes to request extra CPUs for assistance. System throughput decreases by processing these requests. Also, program execution suffers if the program slows down to request and wait for additional CPUs. As a result, exploitable parallelism must occur on a relatively coarse grain of program structure, and opportunities for multiprocessing are lost.
Another problem occurs with Operating System interrupts. If the Operating System needs to disconnect a process that is performing useful parallel work in order to connect a process that is currently of higher priority, the context of the process being disconnected must be saved. This interruption introduces two inefficiencies.
First, the saving of context is additional overhead. However, the saving and restoring of context is not necessary for some types of parallel work. For example, suppose a process, tasked to execute iterations of a Parallel DO Loop, is interrupted by the Operating System. If this interruption occurs at the end of a particular iteration, then the process would have finished its useful work and returned its results to common memory. In that case, no context needs to be saved to restart the process later.
Second, the user program may have to wait for the interrupted process to return before continuing with useful work. Generally, no work beyond the parallel region can be started until all the work in the parallel region has been completed. This is necessary to ensure program correctness. If the interrupted process has not returned to continue its work, other processes that have finished their parallel work are forced to wait.