This invention relates generally to data processing apparatus and, more particularly, to improved apparatus and methods for enhancing the performance of data processing operations.
In many types of data processing systems, each task is divided into several smaller pieces to permit concurrent performance of the task by several smaller data processors or modules. Although this approach permits each of these smaller data processing modules to be optimized for performance of its particular portion of the task, the transferring of the partially completed task between modules
For example, a typical way of implementing data transfers between modules in a system such as described above is to provide input and output scratchpad queues for each module so as to permit data to be asynchronously transferred between modules. In such an implementation, a sending module places data to be transferred in its output queue and then forgets about it. The output queue asynchronously transfers the data to the receiving module's input queue when it is available to receive the data. Then, when the receiving module requires the transferred data, it simply reads the data from its input queue. Although the use of input and output queues in this manner solves the problem of having to synchronize module operations, it has the disadvantage that the queue structure is expensive and also involves considerable overhead.
Of course, a more direct way of implementing data transfers between modules is to simply provide sufficient wiring between modules so that data can be directly transferred therebetween with little or no overhead penalty. However, such an approach not only requires a large and expensive interconnection structure, but also requires that the data be in registers rather than in more cost-effective scratchpads.