1. Technical Field
The present invention relates to a method and system for maintaining cache coherency in general and, in particular, to a method and system for maintaining cache coherency within a data-processing system. Still more particularly, the present invention relates to a method and system of providing a cache-coherency protocol for maintaining cache coherency within a multiprocessor data-processing system.
2. Description of the Prior Art
In a symmetric multiprocessor (SMP) data-processing system, all of the processing units are generally identical; that is, they all utilize a common set or subset of instructions and protocols to operate and, generally, have the same architecture. Each processing unit includes a processor core having multiple registers and execution units for carrying out program instructions. Each processing unit may also have one or more primary caches (i.e., level one, L1, caches), such as an instruction cache and/or a data cache, which are implemented utilizing high-speed memories. In addition, each processing unit may also include additional caches, typically referred to as a secondary cache or level two (L2) cache for supporting the primary caches such as those mentioned above.
Within an SMP environment, it is important to provide a memory-coherency scheme, such that read or write operations to each individual memory location are serialized in some order for all processors. In other words, all processing units will observe the read or write operation to a given memory location in a given order.
As for the caches, there are a number of protocols and techniques for achieving cache coherency that are well-known to those skilled in the art. Not surprisingly, all these protocols grant only one processing unit the "permission" to write to a cache line at any given point in time. As a consequence of this requirement, whenever a processing unit attempts to write to a cache line, it must first inform all other processing units of its desire to write to the cache line and receive permission from all other processing units before carrying out the write operation.
In order to achieve communications among various processing units, a requesting processing unit must pass messages over the interconnect (such as a bus) indicating its desire to read or write cache line. When the request is placed on the interconnect, all of the other processing units "snoop" (or monitor) this operation and decide if the state of their caches can allow the requested operation to proceed and, if so, under what conditions. There are several bus transactions that require snooping and follow-up action to honor the bus transactions and maintain cache coherency. The snooping operation is triggered by the receipt of a qualified snoop request, generated by the assertion of certain bus signals. Instruction processing is interrupted only when a snoop hit occurs and a snoop state machine determines whether or not an additional cache snoop is required to resolve the coherency of the offended cache line.
This kind of communication is required because the most recent, valid copy of information may have been moved from the system memory to one or more of the caches within the system. In fact, the correct version of the information may be in either the system memory or one of the caches within the system, or both. Thus, if the correct version is in one or more of the other caches within the system, it is important to obtain the correct value from the cache(s) rather than the system memory.
In order to achieve cache coherency within the system, a state-bit field is utilized to indicate the current "state" of a cache line. This state information is then utilized to allow certain optimization in the cache-coherency protocol for reducing message traffic on the generalized interconnect and the inter-cache connections. As one example of this mechanism, when a processing unit executes a read, it receives a message indicating whether or not the read must be retried later. If the read operation is not retried, the message usually also includes information allowing the processing unit to determine if any other processing unit also has a valid and active copy of the information (this is accomplished by having the other lowest-level caches give a "shared" or "not shared" indication for any read they do not retry). Therefore, a processing unit can determine whether or not any other processing unit in the system has a copy of the information. If no other processing unit has an active copy of the information, the reading processing unit marks the state-bit field of the cache line as Exclusive. If a cache line is marked as Exclusive, it is permissible to allow the processing unit later to write the cache line without first communicating with other processing units in the system, because no other processing unit has a copy of the same information. Therefore, it is possible for a processing unit to read or write a cache line without first communicating this intention via the interconnect, but only when the coherency protocol has ensured that no other processing unit has an interest in the same information.
A further improvement in accessing cache blocks can be achieved by utilizing a procedure known as "intervention." An intervention procedure allows a cache to have control over a memory block for providing the data or instruction in that block directly to another cache requesting the value (for a read-type operation). In other words, the intervention procedure bypasses the need to first write the data or instruction to the system memory and then have the requesting processing unit read it back again from the system memory. An intervention can only be performed by a cache having the value in a cache line whose state is Modified or Exclusive. In both of these states, there is only one cache line that has a valid copy of the value, so it is a simple matter to source the value over the bus without the necessity of first writing it to the system memory. The intervention procedure thus speeds up processing by avoiding the longer process of writing to and reading from the system memory (which typically requires three bus operations and two memory operations). Hence, the intervention procedure not only results in better latency, it also increases bus bandwidth.
Prior-art cache-coherency protocols do not provide for intervention when data or instructions are held in Shared states by two or more caches because, generally, it is difficult to determine which cache would source the information. Intervention with Shared cache states can be provided if a system collects all of the Shared responses and then selects (e.g., arbitrarily) which cache should source the information. This approach, however, is no faster than first writing and then reading from the system memory, and so it provides no benefit. Also, because instructions (as opposed to data) are never written, the state of any cache block containing a valid instruction is always Shared, and so instructions cannot be sourced by way of intervention.
Consequently, it would be desirable to devise an improved cache-coherency protocol for maintaining cache coherency which allowed for efficient intervention of data with Shared states. Further, it would also be desirable that such an improved cache-coherency protocol provides indication that a cache line is allocated and valid upstream of a given cache level, while undefined at that level, in order to avoid unnecessary bus operations for a sectored cache.