1. Field of the Invention
The present invention relates to reducing inter-task latency in a multi-processor system. More particularly, this invention relates to reducing inter-task latency in a multiprocessor system on which software is executed which includes at least one synchronous remote procedure call.
2. Description of the Prior Art
Remote procedures calls (RPCs) are a known technique for programming multiprocessor systems. An RPC typically allows a program executing on one processor to cause a task to be executed by another processor in the multiprocessor system. In U.S. patent application Ser. No. 11/976,315 the concept of using RPCs to cause the execution of tasks on accelerators (such as DMA engines, data engines etc.) in the multiprocessor system is discussed.
RPCs may be categorised as either synchronous RPCs or asynchronous RPCs. From a programming point of view a synchronous RPC is the simpler of the two and operates much like a function call, except that the function is performed remotely on another processor or engine as is illustrated in FIG. 1A. In this example, the control processor (CP) is performing a sequence of operations in accordance with the instructions it is executing and at point 10 begins execution of an instruction to execute a synchronous RPC, which comprises triggering a remote processor (RP) to perform function A. CP sets up the required inputs to function A, for example by placing them in a memory space that RP can access, and then triggers RP to begin execution of function A by sending an appropriate signal (illustrated here as “do A”). Hence at 12, RP begins execution of function A. Whilst RP is executing function A CP waits until it receives a completion signal from RP. RP completes function A at 14 and sends the completion signal (“done A”) to CP. At 16, CP wakes up in response to the completion signal from RP, reads the outputs of the function A (which RP has placed in a memory space accessible to CP) and continues execution of its sequence of instructions.
The feature of a synchronous RPC which makes it “synchronous” is the fact that the control processor waits for the remote processor before continuing execution (i.e. CP waits between 10 and 16 in FIG. 1). This synchronisation between the controller and remote processors makes synchronous RPCs easier to program, because there is no parallelism between the two processors. However, this ease of programming also results in some inter-task latency between tasks carried out in such a multiprocessor system.
For example, as illustrated in FIG. 1B, if CP executes a first synchronous RPC to cause RP to execute function A, and once it receives the signal “done A” from RP it executes a second synchronous RPC to cause RP to execute function B, there will be a delay between 14 (when RP sends signal “done A” to CP), 16 (when CP receives this signal), 18 (when CP executes the second synchronous RPC and sends signal “do B” to RP), and 20 (when RP begins execution of function B). Finally, once RP completes function B at 22, signals CP with “done B” and CP has received this signal at 24, CP can recommence execution of its instruction sequence.
The inter-task latency associated with synchronous RPCs can be reduced by exploiting parallelism between CP and RP, as illustrated in FIGS. 2A, 2B, 2C and 2D, and instead making use of asynchronous RPCs. In particular, if RP supports a task queue which allows multiple RPCs to be queued ready for execution, this allows CP to place multiple RPCs requests into the task queue which can have various benefits as explained in the following.
FIG. 2A illustrates a situation in which CP requires two functions A and B to be executed by RP (as in FIG. 1B). RP in this example has a task queue which can accept a pending task to be executed whilst a current task is already executing. Hence, having initiated function A at 30, at 32 CP is able to signal RP to execute function B, this pending task being placed in RPs task queue, such that at 34 (when function A completes) RP can both signal CP that function A has completed (“done A”) and immediately initiate execution of function B. When function B completes at 36, RP signals CP that B has completed (“done B”) and on receipt of this signal at 38 CP continues execution on a sequence of instructions. Hence it can be seen that the use of asynchronous RPCs can substantially reduce the inter-task latency when a sequence of RPCs are executed on a single remote processor.
If CP wishes to perform a sequence of N synchronous RPCs, it must wait for each task to complete and therefore the RP must signal N times and CP must wait N times. However, as illustrated in FIG. 2B, when using asynchronous RPCs with a task queue with a capacity of N (i.e. the task queue is capable of holding N RPCs requests) CP need only wait for the last task to complete. As illustrated in FIG. 2B, at 40 CP signals to RP that N RPCs should be added to its task queue and at 42 RP begins execution of this series of RPCs. On completion of RPCN at 44, RP signals that the last RPC is complete to CP, which at 46 continues execution of its sequence of instructions. The cost associating with signalling and waiting can be significant, so using asynchronous RPCs can reduce the task invocation overhead by up to N times.
The multiprocessor system can consist of more than one remote processor, as illustrated in FIG. 2C. In the situation where CP wishes to perform an RPC on RP1 and then to execute another RPC on RP2 this can be done (using synchronous or asynchronous RPCs) by causing RP1 to signal CP when the first RPC completes, and CP then initiating the second RPC on RP2. However, if the two remote processors are able to signal each other, then asynchronous RPCs allow CP to initiate the first RPC on RP1, further indicating to RP1 that it should signal RP2 when it has completed its task. At the same time CP can queue up the second RPC on RP2, indicating that it should not start execution until the signal from RP1 arrives. This way of executing is only possible if asynchronous RPCs are used and if a suitable signalling mechanism exists, as illustrated in FIG. 2C. At 50, CP signals RP1 (“do A”) and further signals RP2 to begin function B when RP2 receives “done A” from RP1 (i.e. CP sends “do B when done A received”). At 52, when RP1 completes A, it signals this fact to RP2 (“done A”) and at 54 RP2 begins execution of function B. At 56, when RP2 has completed function B, it signals this fact to CP (“done B”) and at 58 CP continues execution of its sequence of instructions. Hence it can be seen that the use of asynchronous RPCs allows inter-task latency to be reduced, even if the sequence of RPCs is spread across multiple processors.
If CP wishes to perform two synchronous RPCs on different remote processors in parallel with each other, the only mechanism available is to execute two parallel threads, each of which performs a synchronous RPC. However, using asynchronous RPCs CP can (in one thread) start two RPCs and wait for both to complete. As illustrated in FIG. 2D, at 60 CP signals RP1 to execute function A (“do A”) and signals RP2 to execute function B (“do B”). Hence at 62 RP1 begins execution of function A and at 64 RP2 begins execution of function B. At 66 RP1 completes execution of function A and signals this fact (“done A”) to CP, which receives this message at 68 but continues to wait for the completion of function B. RP2 completes execution of function B at 70, and signals this fact (“done B”) to CP, which at 72 continues execution of its sequence of instructions. Hence it can be seen that asynchronous RPCs allow parallelism between processors.
Nevertheless, in practice asynchronous RPCs can be difficult for the programmer to use. Various asynchronous RPC libraries are known, but they all suffer from the problem of being hard to program. Some common errors include: suppressing the signalling of task completion, but still waiting for the task to complete; not suppressing the signalling of task completion and not waiting for the task to complete; writing too many RPC requests into a task queue of finite capacity; introducing a deadlock condition where the next RPC request on each of two different remote processors cannot start until it receives a signal indicating that the other RPC request has completed; and introducing race conditions where the behaviour of the program depends on the relative speeds of tasks running on different processors.
The IBM RPC library allows sequences of RPCs to be sent as one group. This reduces the inter-task latency associated with signalling and waiting identified above, but it cannot reduce inter-task latency when the RPCs execute on multiple processors. Furthermore it does not assist the programmer in avoiding the problems described above, such as the introduction of race conditions or omitting waits.
As such the programmer is typically faced with a choice between the simplicity and reliability of programming using synchronous RPCs and the performance benefits of using asynchronous RPCs.
Some discussions of the use of RPCs in the prior art can be found in the following: “Optimizing RPC”, Sandler D., COMP 520, Sep. 9, 2004; “Lightweight RPC”, Bershad B., Anderson T., Lazowska E., Levy H., 1990; and “Flick: A flexible, optimizing IDL compiler”, Eide E., Frei K., Ford B., Lepreau J. and Lindstrom G., ACM SIGPLAN '97, pages 44-56, Las Vegas, Nev., June 1997 .
U.S. patent application Ser. Nos. 11/976,314 and 11/976,315 discuss the programming of multiprocessor systems. Some background information on the analysis of dependencies in the compilation of program code for such systems can be found in Chapters 9.0 to 9.2 of “Advanced Compiler Design and Implementation”, S. Muchnick, Morgan Kaufmann, 1997 and in “Conversion of control dependence to data dependence”, Allen J., Kennedy K., Porterfield C. and Warren J., 10th ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages (Austin, Tex., Jan. 24-26, 1983), ACM, New York, N.Y., 177-189.
It would be desirable to provide an improved technique for programming multiprocessor systems, which combined the simplicity of programming with synchronous RPCs and the performance benefits of using asynchronous RPCs.