In order to improve the performance of computers having a single central processing unit, computer designers have developed computers which have many central processing units. Often, the central processing units in such multiprocessing computers are connected to each other and to the system's main memory over a single common bus. Recently, however, central processing unit performance is improving at a faster rate than bus performance technology. Faster internal central processor performance results in the need for more external bandwidth. That is, the amount of data transmitted on a common bus must increase to support increased central processing performance. Consequently, the number of central processors which can be connected to a common bus is limited by the bandwidth needed to support the central processors and the total bandwidth of the common bus.
One approach for reducing the bus bandwidth required by each processor in a multiprocessing system has been to place a cache unit between each processor and the common bus. Once data is loaded into a processor's associated cache unit, the processor can access the data in the cache unit without using the common bus. Typically, when a processor obtains data from its cache unit, less data is transmitted over the limited bandwidth of the common bus.
In many cases, a processor will modify a particular data value many times which, in turn, necessitates rewriting the data value back to main memory each time the data value is modified. Rewriting modified data values back to main memory, however, increases the amount of bus bandwidth needed to support a processor. Therefore, if the number of write operations can be reduced, the bus bandwidth required to support a processor can be reduced.
One type of cache unit which reduces the number of write operations is called a "write-back" cache. A write-back cache temporarily stores the modified data values and thus reduces the number of bus transactions needed to write the data values back to main memory. For example, a processor may modify a data value many times in the write-back cache without writing the data back to main memory. The write-back cache ensures that the modified data is eventually written back to main memory.
While write-back caches can be very efficient at reducing the total bus bandwidth required by a multiprocessing system, write-back caches unfortunately create memory coherency problems. For example, each write-back cache contains its own copy of a data value. In such situations, if more than one processor can independently modify a data value, then different versions of the same data value could exist in more than one write-back cache. This would result in erroneous operations, consequently, some mechanism must ensure that all the processors have a consistent view of all data values at all times.
For example, when a processor modifies a data value, the modified data value exists in the write-back cache before it is written back to main memory. In this example, until the write-back cache writes the modified data value back to main memory, the main memory and the other cache units will contain a stale copy of the data value. In order to maintain data integrity, however, the other processors which request the data value must obtain the up-to-date version of the data value, not the stale data value.
The process of ensuring that all the processors have a consistent view of all data values is called cache coherency. One popular and successful set of methods for achieving cache coherency relies on what are called "snooping operations". While a wide variety of snooping operations exist, basically, the snooping operations in a cache unit monitor the bus transactions on the common bus. The snooping operations identify which transactions affect the contents of a cache unit or which transactions relate to modified data existing in a cache unit. Snooping operations typically require that all the processors and their associated cache units share a common bus. Sharing a common bus allows the cache units to monitor the bus transactions and potentially interfere with a bus transaction when a particular cache unit contains a modified data value.
Cache coherency methods also typically utilize coherency status information which indicates whether a particular data value in a cache unit is invalid, modified, shared, exclusively owned, etc. While many cache coherency methods exist, two popular versions include the MESI cache coherency protocol and the MOESI cache coherency protocol. The MESI acronym stands for the Modified, Exclusive, Shared and Invalid states while the MOESI acronym stands for the Modified, Owned, Exclusive, Shared and Invalid states.
The meanings of the states vary from one implementation to another. Broadly speaking, the modified state usually means that a particular cache unit has modified a particular data value. The exclusive state and owned state usually means that a particular cache unit may modify a copy of the data value. The shared state usually means that copies of a data value may exist in different cache units, while the invalid state means that the data value in a cache unit is invalid.
In operation, the cache units snoop the bus operations and use the coherency status information to ensure cache coherency. For example, assume that a first processor having a first cache unit desires to obtain a particular data value. Furthermore, assume that a second processor having a second cache unit contains a modified version of the data value (the coherency status information indicates that the data value in the second cache unit is in the modified state).
In this example, the first processor initiates a read bus request to obtain the data value. The second cache unit snoops the read bus request and determines that it contains the modified version of the data value. The second cache unit then intervenes and delivers the modified data value to the first processor via the common bus. Depending on the system, the modified data value may or may not be simultaneously written to the main memory.
In another example, assume that the first processor desires to exclusively own a particular data value. Furthermore, assume that a second cache unit contains an unmodified, shared copy of the data value (the coherency status information indicates that the data value in the second cache unit is in the shared state). In this example, the first processor initiates a read bus request which requests data for exclusive use.
The second cache unit snoops the read bus request and determines that it contains a shared copy of the data value. The second cache unit then invalidates its shared data value by changing the data value's coherency status information to the invalid state. Changing the data value's coherency status to the invalid state invalidates the data value within the second cache unit. The first processor then completes the read bus request and obtains a copy of the data value from main memory for exclusive use.
While snooping operations maintain cache coherency on multiprocessing systems with a single common bus, more powerful computers contain more than one bus such that each bus interconnects main memory with multiple processors; however, because a common bus has a growing limitation in the number of processors it can support, a multiple-bus system might be necessary to achieve a desired level of performance. A problem associated with multiple buses is that the processors on one bus cannot monitor the transactions initiated by the processors on the other buses. Consequently, the snooping operations cannot maintain memory coherency in multiple-bus computers.
One way to maintain cache coherency in multiple-bus systems is to broadcast the bus transactions initiated on each bus to all the other buses. Unfortunately, this approach results in having the combined bus bandwidth load of all buses transmitted to each bus. As can be expected, this can significantly reduce system performance and obviate the benefit of multiple buses.
A second approach is based on what are called directory-based cache coherency methods. The IEEE Scaleable Coherent Interconnect is an example of a multiple-bus, directory-based cache coherency system. In directory schemes, the processors do not snoop the bus transactions. Rather, the main memory subsystem maintains memory coherency by storing extra information with the actual data.
The extra information in the main memory subsystem typically indicates 1) which processor or processors have obtained a copy of a data value and 2) the coherency status of the data values. For example, the extra information may indicate that more than one processor shares the same data value. In yet another example, the extra information may indicate that only a single processor has the right to modify a particular data value.
When a processor requests a data value, the main memory subsystem determines whether it has an up-to-date version of the data value. If not, the main memory subsystem transfers the up-to-date data value from the processor with the up-to-date data value to the requesting processor. Alternatively, the main memory can indicate to the requesting processor which other processor has the up-to-date data value.
Because the information regarding the location of the up-to-date version of each data value is kept by the main memory subsystem, the processors do not need to "snoop" the bus transactions. Keeping such a directory, however, can add significant cost to a system due to the additional information that must be held for each data value in main memory. In addition, maintaining a directory for each data value in main memory can also degrade system performance due to the time needed to locate and transfer the required data to a requesting processor.
An alternative to directory-based systems would be a bus interconnect which stores the coherency status information associated with the data values which are actually stored in the cache units. Thus, rather than storage which increases proportionally as the main memory increases (as in directory-based schemes), the amount of storage is only related to the much smaller size of the combined cache units. This approach, however, requires the multiple-bus system to store a duplicate copy of the coherency status information associated with all the data values in each of the cache units.
For example, Sun Microsystem's UltraSparc system uses a bus switch to interconnect multiple buses wherein each bus is in communication with processors having internal cache units. The bus switch maintains a duplicate copy of the coherence status information associated with all the data values in the cache units. In the UltraSparc system, the bus switch is capable of maintaining a duplicate copy of the coherency status information because the processors in the UltraSparc system are configured to provide accurate information as to which data value is being replaced allowing an external cache tag can be maintained.
Such a bus switch, however, is not feasible with many off-the-shelf processors because they do not output accurate cache data replacement information. For example, many conventional processors keep accurate coherency status information only within their internal cache units. Thus, other devices cannot determine when a data value is removed from an internal cache unit. Without accurate information about the coherency status information in the internal cache units, a bus switch cannot maintain a duplicate copy of the coherency status information.