Large-scale data processing systems typically utilize a tremendous amount of memory. This is particularly true in multiprocessing systems where multiple processing units and numerous input/output modules are implemented. There are several memory methodologies known in the art that provide for efficient use of memory in such multiprocessing environments. One such memory methodology is a distributed memory where each processor has access to its own dedicated memory, and access to another processor's memory involves sending messages via an inter-processor network. While distributed memory structures avoid problems of contention for memory and can be implemented relatively inexpensively, it is usually slower than other memory methodologies, such as shared memory systems.
Shared memory is used in a parallel system, or multiprocessing, system, and can be accessed by more than one processor. The shared memory is connected to the multiple processing units--typically accomplished using a shared bus or network. Large-scale shared memories may be designed to cooperate with local cache memories associated with each processor in the system. Cache consistency protocols, or coherency protocols, ensure that one processor's locally-stored copy of a shared memory location is invalidated when another processor writes to that shared memory location.
More particularly, when multiple cache memories are coupled to a single main memory for the purpose of temporarily storing data signals, some system must be utilized to ensure that all processors, such as instruction processors (IPs) are working from the same (most recent) copy of the data. For example, if a copy of a data item is stored and subsequently modified in a cache memory, another IP requesting access to the same data item must be prevented from using the older copy of the data item stored either in main memory or the requesting IP's cache. This is referred to as maintaining "cache coherency." Maintaining cache coherency becomes more difficult as more caches are added to the system since more copies of a single data item may have to be tracked.
For distributed systems having hierarchical memory structures, a cache directory is a practical manner of maintaining cache coherency. Directory-based coherency systems utilize a centralized directory to record the location and the status of data as it exists throughout the system. For example, the directory records which caches have a copy of the data, and further records if any of the caches have an updated copy of the data. When a cache makes a request to main memory for a data item, the central directory is consulted to determine where the most recent copy of that data item resides. Based on this information, the most recent copy of the data is retrieved so it may be provided to the requesting cache. The central directory is then updated to reflect the new status for that unit of memory.
Along with IP caching, it is also desirable to allow input/output (I/O) units to maintain copies of memory data for either read or read/write purposes. This becomes particularly important as the number of input/output (I/O) units in the system increases. This localized I/O storage may be accomplished by coupling one or more of the I/O units to shared I/O cache memories or other I/O buffers.
In addition to maintaining cache coherency, multiprocessing systems such as Symmetrical Multi-Processor (SMP) systems require "processor consistency." This means that all processors of the multiprocessor system, including I/O processors, processing module processors such as instruction processors and the like, collectively observe modifications to storage locations in the same order that they were modified by individual processors. For example, assume two processors referencing storage locations L1 and L2. A first processor, Processor A, first writes to location L1 and then to location L2. Assume that a second processor, Processor B, wants to read location L2 followed by location L1. If Processor B were to recognize that the information in location L2 was newly updated by Processor A, then Processor B would know that L1 would also obtain new data since it was written by Processor A prior to L2 being written by Processor A. An application of this consistency rule can be realized by implementing memory locking mechanisms. That is, if an updated copy of data exists within a local cache, other processors are prohibited from obtaining a copy of the data from main memory until the updated copy is returned to main memory, thereby releasing the lock. More specifically, Processor A will change a data structure in location L1, and set a lock cell in location L2. Processor B will first read the lock cell in location L2 to determine whether there is new data available in location L1. If Processor B recognizes that a lock cell is set, it knows that the new data structure in location L1 is present and will thereafter make reference to it. Until then, it is "locked out" in order to avoid the situation where Processor B obtains any invalid data.
Such a consistency rule becomes increasingly difficult to apply in systems having multiple caches and multiple data paths. For example, in a cache-based system, location L1 could currently reside in Processor B's cache. Processor A, which wants to update the data at location L1 currently owned by Processor B, will typically cause a memory controller to "invalidate" the data at location L1 in Processor B's cache and cause the valid data in Processor B's cache to be returned to a main storage area. Processor A thereafter might deliver a new value to the main storage area for location L2. However, this new value can potentially be immediately read by Processor B before the invalidate signal reaches Processor B. In such an instance, Processor B would recognize a new value at location L2, but would erroneously read its own cache to obtain the data at L1 since the invalidate signal had not yet reached Processor B. In other words, "invalidate" traffic and "data delivery" traffic do not necessarily travel the same paths within the system, and therefore could encounter different delays as they flow to their destination due through different data paths, queuing structures, and the like. Such a condition may cause a violation of the consistency rule required to maintain processor consistency.
One manner of managing such a processor consistency quandary in a directory-based cache coherency scheme is to delay the data delivery from a targeted location until all of the associated coherency functions for that particular cache line have been sent. This, however, results in undesirable latencies that adversely affect overall system performance. It would therefore be desirable to provide a system and method for providing processor consistency between processors in a multiprocessing, multi-cached system without experiencing undesirable time delays where one processor requires data owned by another processor. The present invention provides a solution to the shortcomings of the prior art, and offers numerous advantages over existing processor coherency methodologies.