Large-scale data processing systems typically utilize a tremendous amount of memory. This is particularly true in multiprocessing systems where multiple processing units and numerous input/output modules are implemented. There are several memory methodologies known in the art that provide for efficient use of memory in such multiprocessing environments. One such memory methodology is a distributed memory where each processor has access to its own dedicated memory, and access to another processor's memory involves sending messages via an inter-processor network. While distributed memory structures avoid problems of contention for memory and can be implemented relatively inexpensively, it is usually slower than other memory methodologies, such as shared memory systems.
Shared memory is used in a parallel or multiprocessing system, and can be accessed by more than one processor. The shared memory is connected to the multiple processing units--typically accomplished using a shared bus or network. Large-scale shared memories may be designed to cooperate with local cache memories associated with each processor in the system. Cache consistency protocols, or coherency protocols, ensure that one processor's locally-stored copy of a shared memory location is invalidated when another processor writes to that shared memory location.
More particularly, when multiple cache memories are coupled to a single main memory for the purpose of temporarily storing data signals, some system must be utilized to ensure that all processors, such as instruction processors (IPs) are working from the same (most recent) copy of the data. For example, if a copy of a data item is stored, and subsequently modified in a cache memory, another IP requesting access to the same data item must be prevented from using the older copy of the data item stored either in main memory or the requesting IP's cache. This is referred to as maintaining cache coherency. Maintaining cache coherency becomes more difficult as more caches are added to the system since more copies of a single data item may have to be tracked.
Many methods exist to maintain cache coherency. Some earlier systems achieve coherency by implementing memory locks. That is, if an updated copy of data existed within a local cache, other processors were prohibited from obtaining a copy of the data from main memory until the updated copy was returned to main memory, thereby releasing the lock. For complex systems, the additional hardware and/or operating time required for setting and releasing the locks within main memory cannot be justified. Furthermore, reliance on such locks directly prohibits certain types of applications such as parallel processing. Other manners of maintaining cache coherency exist, such as memory bus "snooping", and other techniques.
For distributed systems having hierarchical memory structures, a directory-based coherency system becomes more practical. Directory-based coherency systems utilize a centralized directory to record the location and the status of data as it exists throughout the system. For example, the directory records which caches have a copy of the data, and further records if any of the caches have an updated copy of the data. When a cache makes a request to main memory for a data item, the central directory is consulted to determine where the most recent copy of that data item resides. Based on this information, the most recent copy of the data is retrieved so it may be provided to the requesting cache. The central directory is then updated to reflect the new status for that unit of memory.
However, due to the ever-increasing number of processing devices and I/O modules capable of being configured within a single system, it has become quite common for one processing module or I/O module to request data that is currently "owned" by another processing or I/O module. Typically, such a request involves a first request by a central supervisory memory to have the data and ownership returned to that central memory, such as a main computer memory. The current owner of the valid data then transfers both the current data and ownership back to the central memory which must then provide the data to the requesting processing or I/O module in a conventional manner. While such a system facilitates cache coherency maintenance, it results in unnecessary latencies, particularly where there are a large number of potential local or cache memories external to the main memory.
It would therefore be desirable to provide a system and method for directly transferring locally-stored data, such as cached data, between requesters while maintaining cache coherency. The present invention provides a solution to the shortcomings of the prior art, and offers numerous advantages over existing cache coherency methodologies.