1. Technical Field
The present invention relates generally to computer software, and more particularly, to methods of transferring data between processors in a multiple processor data processing system wherein performance is maximized.
2. Description of Related Art
In a multi-processor system with local caches, when a bus device requests a piece of data, one of three conditions is possible. The first is that the requested data is not already present in the local caches of the other bus devices. In this situation, the data must be provided by the main memory. The second is that the requested data is present in the local cache of another bus device which has modified the data since it was fetched from the main memory. In this situation, the requesting device must retrieve the data from the device that has the modified copy of the data. This may be done either directly via a cache-to-cache transfer between the two bus devices or in an indirect fashion, i.e., force the other bus device to update the data in the main memory and then allow the new requesting bus device to fetch the updated data from the main memory. The third is that the requested data is present in the local cache of other bus device(s) that have not altered the data.
In the third case, some bus protocols allow one of the other bus devices to intervene in the memory access request and provide the data to the new requestor directly via a cache-to-cache transfer. Since such xe2x80x9cshared-interventionxe2x80x9d transfers can typically be accomplished in less time than a main-memory access, substantial performance improvements can be realized.
Furthermore, in some multi-processor systems, processors are grouped into multiple multi-processor nodes (i.e. two level clustering). Data transfers between processors on the same node can be accomplished with a much shorter request-to-data latency than data transfers from memory to the processor, which, in turn, have a shorter latency than transfers between processors on different nodes. As a result, it was desirable to implement a selective form of shared intervention. If a processor having a shared copy of the requested data is on the same node as the requesting processor, the shared intervention is allowed to proceed. Otherwise, the shared-intervention is blocked and the request is handled by the main memory unit. This mechanism allows the system to use a faster and more efficient method of servicing the data request.
However, in current methods of implementing shared response systems, selection of the processor to use to send the requested data to the requesting processor is based solely on the priorities of the responses received from all of the processors regarding which processors were in possession of the requested data. This sometimes results in shared interventions being awarded to processors on remote nodes which results in the longest read-to-data valid latency and requires higher utilization of system data busses and buffers. Thus, the overall effect when shared intervention is awarded to processors on remote nodes is a lower overall system bandwidth. Therefore, it is desirable to have a method and system of transferring data between processors of a multi-processor data processing system having improved efficiency.
The present invention provides a method for transferring data between processors in a multiple processor data processing system. In a preferred embodiment a request for data is received from a requesting processor at a transaction response collection and distribution logic unit. The request for data from the requesting processor is broadcast to all processors of the data processing system. The transaction response collection and distribution logic unit receives an individual response from each of the plurality of processors, wherein the individual response specifies the state of the requested data within a cache associated with the individual processor. The transaction response collection and distribution logic unit evaluates all received responses and provides each processor with an appropriate final response state. The final response state determines which processor and associated memory cache will send the requested data to the requesting processor or if the requested data will be provided from the system""s main memory.