1. Field of the Invention
This invention relates generally to the field of computer systems and, more particularly, to accelerating cache-to-cache transfers in computer systems with multiple processors.
2. Description of the Related Art
Data processing systems often include multiple processors that share a common system memory. As used herein, the term processor can be used to refer to an active device that uses a system memory. Processors can include microprocessors, input-output bridges, graphics devices, peripheral devices, or other devices that read or write to a system memory. Processors in data processing systems often include a cache to improve performance. Caches in a multiple processor system can include cache tags for each cache line. The cache tags can specify the access rights and ownership responsibilities for a corresponding processor.
Caches in a multiprocessor shared memory system need to be maintained in a consistent and coherent manner. A set of cache protocol rules can be employed to ensure that data stored in a system memory appears the same to all processors despite being stored in various caches. One approach to cache consistency is the use of a standard snooping protocol. In a standard snooping protocol, each processor broadcasts all of its requests for cache lines to all of the other processors on an address bus. Each processor, in turn, xe2x80x9csnoopsxe2x80x9d the requests from other processors and responds, as needed, by updating its cache tags and/or conveying the data corresponding to the cache line to the other processor. In a standard snooping protocol, requests arrive at all processors in the same order, and each processor processes the requests in the order that they arrive. A processor can be said to xe2x80x98processxe2x80x99 a request when the request affects the internal state of the processor. In a standard snooping protocol, requests include local requests and foreign requests. Local requests include requests generated by the processor itself while foreign requests include requests from other processors. Requests can also be referred to as address packets since requests typically specify a cache line by its address. Address packets can also be referred to as address broadcasts. The terms request, address packet, and address broadcast will be used interchangeably herein.
The requirement that requests be processed in the order in which they arrive can be considered a performance drawback of a standard snooping protocol. In particular, this requirement can delay the transfer of data from one cache to another cache. The requirement can also delay the processing of data received by a processor if the data corresponds to a local request and it arrives before data that correspond to an earlier local request. In processing a local request, a processor waits for the data corresponding to the local request to arrive before processing other requests. Potentially, multiple processors can be waiting for a single processor to receive its data. This situation can create an undesirable latency in the system. If a processor is allowed to process other requests prior to the data corresponding to a local request arriving, however, starvation can result if a subsequent request revokes the processor""s access rights to the cache line of the local request before the processor receives the data. A system for reducing the latency of a standard cache consistency protocol is needed without introducing starvation problems into the system.
The problems outlined above are in large part solved by the use the apparatus and method described herein. Generally speaking, an apparatus and method for expediting the processing of requests in a multiprocessor shared memory system is provided. In a multiprocessor shared memory system, requests can be processed in any order provided two rules are followed. First, no request that grants access rights to a processor can be processed before an older request that revokes access rights from the processor. Second, all requests that reference the same cache line are processed in the order in which they arrive. In this manner, requests can be processed out-of-order to allow cache-to-cache transfers to be accelerated. In particular, foreign requests that require a processor to provide data can be processed by that processor before older local requests that are awaiting data. In addition, newer local requests can be processed before older local requests. As a result, the apparatus and method described herein may advantageously increase performance in multiprocessor shared memory systems by reducing latencies associated with a cache consistency protocol.
In one embodiment, a processor can include a first queue and a second queue to implement the above rules. The first and second queue can be operated in a first-in-first-out (FIFO) manner. The processor can store address packets that grant access rights to the processor in the first queue. The processor can also store in the first queue address packets that reference the same cache line as an address packet in the first queue. The processor can store all other address packets in the second queue. Address packets that require data can remain at the head of a queue until the data arrives. The processor can be configured to process address packets from the head of either queue in any order. In this manner, foreign packets that require data can be handled before older local packets that are awaiting data.