Computer systems may contain multiple processors that may work together to perform a task. For example, a computer system may contain four processors that may share system resources (e.g., input devices or memory devices) and may perform parallel processing. The processors may send messages to each other, may send messages to system resources, and may send and receive messages from the system resources. For example, such messages may include requests for information that is stored at a location in a memory device or a request to store information in a location of a memory device.
In a cache coherent shared memory multiprocessor, the set of data currently being used by a microprocessor may be copied from a system memory device such as a dynamic random access memory (DRAM) into a relatively smaller but faster cache memory device such as a static random access memory (SRAM). In such systems, a cache is said to be “coherent” if the information resident in the cache accurately reflects the information in DRAM memory. Lack of cache coherency can occur when a requester for a memory location does not receive the latest copy of the data. For example, if the cache is updated while main memory is not and a new requester for this location receives the data from main memory into the requester's cache, the cache is said to be non-coherent.
Cache “snooping” is a technique used to detect an access to memory that might cause a cache coherency problem. For example, in the case where the cache is updated while the memory is not, the memory request from the new requesting agent is snooped in the cache containing updated data, which then supplies the data to the requester instead of memory. In a multi-processor system, the messages sent between processors may include cache snooping messages generated in accordance with a coherence protocol. A coherence protocol (e.g., MESI protocol) is implemented to prevent cache coherency problems.
Typically, in order for the requester to receive updated data, the result of the cache coherence protocol is first determined before the memory access is started. In a large scale multi-node distributed memory multi-processor, the resolution of the cache coherence protocol may take a long time, resulting in increased memory latency for accessing data from memory.