1. Field of the Invention
The present invention relates generally to a large multi-processor system. Specifically, exemplary embodiments provide a computer implemented method, apparatus, and computer-usable program code for responding to a load instruction that missed in its local caches in a multi-processor network.
2. Description of the Related Art
Increasingly large symmetric multi-processor data processing systems are being built on multiple chips, which communicate with each other through a ring, where a request, known as a command, or data can be moved from one chip to another chip in the system. A chip is composed of one or more processors, a cache, a system memory, and input-output units.
As the system configuration grows, more chips are needed, the ring becomes longer, and more traffic is needed to ensure the correctness of system functions and data consistency. As communication on the ring in a large system increases, there is more power consumption and ring bandwidth is reduced, thereby degrading system performance.
The current art requires 5 phases to satisfy a read request, as follows:
1) Request phase: A read request is placed on the ring.
2) Reflected request phase: The arbiter reflects the request on the ring, called the “reflected read request,” for all snoopers. That is, the arbiter broadcasts the selected request to all snoopers on the bus.
3) Snoop phase: All snoopers in the system place their snoop reply information on the ring, which is forwarded to the arbiter that broadcast the request.
4) Combined Response phase: The arbiter combines the snoop reply information from all of the snoopers into a single response, called a “combined response,” and then places this combined response on the ring to be seen by all snoopers.
5) Data transfer phase: Data is transferred to the requester.
In the current art, the arbiter just combines the snoop replies from all the snoopers, and sends the combined response information out on the ring to all the snoopers. The snoopers take appropriate action(s) based on the information contained within the combined response.
Although there is a very large variety of combined response information, depending on the particular implementation, the most important information is typically these three bits: the retry bit, the intervention bit, and the shared bit.
a) If the retry bit is set, all snoopers and the memory controller will stop working on the request and go idle; there will be no data transfer for the current request. The requester must resend its initial read request.
b) If the retry bit is not set, and the intervention bit is set, the memory controller will stop working on the request and go idle. The intervening cache will send the requested data to the requester (phase 5). The requester and the intervener caches update their cache states appropriately depending on the request type and the value of the shared bit in the combined response.
c) If neither the retry and nor the intervention bit is set, the memory controller will continue to honor the request and will send the data to the requester (phase 5). The requestor's cache updates its cache state accordingly based on the shared bit value in the combined response. If the shared bit is set, then the requestor's cache changes cache state to shared. If the shared bit is NOT set, then the requestor's cache may choose to accept the line in the exclusive state (depending on implementation). The exclusive state for a cache line means there are no other caches in the system that have a copy of that cache line.