1. Field of the Invention
This invention relates generally to a cache coherency scheme for a large-scale symmetrical multiprocessor system; and, more specifically, to an improved directory-based cache coherency scheme for supporting one or more instruction processors and one or more input/output processors which are each coupled to a shared main memory and which are each capable of storing predetermined data signals retrieved from the shared main memory.
2. Description of the Prior Art
Data processing systems are becoming increasing complex. Some systems, such as Symmetric Multi-Processor (SMP) computer systems, couple two or more Instruction Processors (lPs) and multiple Input/Output (I/O) Modules to shared memory. This allows the multiple IPs to operate simultaneously on the same task, and also allows multiple tasks to be performed at the same time to increase system throughput.
As the number of units coupled to a shared memory increases, more demands are placed on the memory and memory latency increases. To address this problem, high-speed cache memory systems are often coupled to one or more of the lPs for storing data signals that are copied from main memory. These cache memories are generally capable of processing requests faster than the main memory while also serving to reduce the number of requests that the main memory must handle. This increases system throughput.
While the use of cache memories increases system throughput, it causes other design challenges. When multiple cache memories are coupled to a single main memory for the purpose of temporarily storing data signals, some system must be utilized to ensure that all IPs are working from the same (most recent) copy of the data. For example, if a copy of a data item is stored, and subsequently modified, in a cache memory, another IP requesting access to the same data item must be prevented from using the older copy of the data item stored either in main memory or the requesting IP""s cache. This is referred to as maintaining cache coherency. Maintaining cache coherency becomes more difficult as more caches are added to the system since more copies of a single data item may have to be tracked.
Many methods exist to maintain cache coherency. Some earlier systems achieve coherency by implementing memory locks. That is, if an updated copy of data existed within a local cache, other processors were prohibited from obtaining a copy of the data from main memory until the updated copy was returned to main memory, thereby releasing the lock. For complex systems, the additional hardware and/or operating time required for setting and releasing the locks within main memory cannot be justified. Furthermore, reliance on such locks directly prohibits certain types of applications such as parallel processing.
Another method of maintaining cache coherency is shown in U.S. Pat. No. 4,843,542 issued to Dashiell et al., and in U.S. Pat. No. 4,755,930 issued to Wilson, Jr. et al. These patents discuss a system wherein each processor has a local cache coupled to a shared memory through a common memory bus. Each processor is responsible for monitoring, or xe2x80x9csnoopingxe2x80x9d, the common bus to maintain currency of its own cache data. These snooping protocols increase processor overhead, and are unworkable in hierarchical memory configurations that do not have a common bus structure. A similar snooping protocol is shown in U.S. Pat. No. 5,025,365 to Mathur et al., which teaches local caches that monitor a system bus for the occurrence of memory accesses which would invalidate a local copy of data. The Mathur snooping protocol removes some of overhead associated with snooping by invalidating data within the local caches at times when data accesses are not occurring, however the Mathur""system is still unworkable in memory systems without a common bus structure.
Another method of maintaining cache coherency is shown in U.S. Pat. No. 5,423,016 to Tsuchiya. The method described in this patent involves providing a memory structure called a xe2x80x9cduplicate tagxe2x80x9d with each cache memory. The duplicate tags record which data items are stored within the associated cache. When a data item is modified by a processor, an invalidation request is routed to all of the other duplicate tags in the system. The duplicate tags are searched for the address of the referenced data item. If found, the data item is marked as invalid in the other caches. Such an approach is impractical for distributed systems having many caches interconnected in a hierarchical fashion because the time requited to route the invalidation requests poses an undue overhead.
For distributed systems having hierarchical memory structures, a directory-based coherency system becomes more practical. Directory-based coherency systems utilize a centralized directory to record the location and the status of data as it exists throughout the system. For example, the directory records which caches have a copy of the data, and further records if any of the caches have an updated copy of the data. When a cache makes a request to main memory for a data item, the central directory is consulted to determine where the most recent copy of that data item resides. Based on this information, the most recent copy of the data is retrieved so it may be provided to the requesting cache. The central directory is then updated to reflect the new status for that unit of memory. A novel directory-based cache coherency system for use with multiple Instruction Processors coupled to a hierarchical cache structure is described in the copending application entitled xe2x80x9cA Directory-Based Cache Coherency Systemxe2x80x9d, U.S. patent application Ser. No. 08/965,004, assigned to the Assignee hereof, which is incorporated herein by reference in its entirety.
Although the foregoing discussion addresses the memory latency and cache coherency issues associated with cache memories coupled to Instruction Processors, it does not consider the problems associated with coupling an increased number of Input/Output (I/O) units to memory in an SMP system. As the number of I/O units in the system increases, it becomes desirable to allow these I/O units to maintain copies of memory data for either read or read/write purposes. This may be accomplished by coupling one or more of the I/O units to shared I/O cache memories or other I/O buffers.
The use of I/O caches and internal I/O buffers for storing copies of data obtained from a shared main memory poses some unique considerations. In some instances, it is desirable to handle this stored I/O data in a manner which is similar to the manner in which cached IP data is handled. That is, the location and state of the cached data within the I/O cache should be tracked and controlled by the main memory. In other instances, it is desirable to handle data provided to an I/O unit differently from the data copies maintained in the IP caches. For example, data may be retrieved from main memory by an I/O unit so that the copy of memory may be written to an I/O sub-system such as a disk unit. Since this copy is just considered a xe2x80x9csnapshotxe2x80x9d in time of the state of a portion of the main memory, there is no reason to track the copy for coherency purposes. In another situation, a block of memory data may be retrieved from main memory and stored in an I/O buffer so that data received from an I/O sub-system may be selectively merged into the stored data. When the modification is complete, the modified block of data is written back to the main memory. In these instances, the I/O unit must retain the block of data long enough to complete the merge operation, then the data must be returned to the main memory. In these instances, it would unnecessarily increase system overhead to require the shared main memory to attempted to retrieve the block of data before the merger were completed. For these reasons, different coherency restrictions should be imposed on those copies of data items stored within I/O units as compared to copies of data items stored within an IP cache.
In addition to the above-described I/O coherency issues, coupling both I/O and IP caches to a shared main memory increases the complexity associated with error detection and prevention. The error handling mechanism must be able to ensure that only legal copies of data items are stored within the various system memories.
Finally, today""s complex systems may include multiple heterogeneous instruction processors. Not all of the instruction processors coupled to the same shared main memory necessarily operate on blocks of cached data that are of the same size. As such, it is desirable to have a coherency system that allows various instruction processors within a system to modify data on different memory boundaries.
Prior art directory-based coherency systems provide coherency among cache memories coupled to instruction processors, but do not track data items stored within I/O memories, and do not address the unique considerations posed by the manipulation of memory data by I/O units. Additionally, prior art systems do not provide for the modification of memory data on various memory boundaries. Finally, prior art systems do not provide the error checking necessary to maintain coherency in large complex systems coupling many instruction processors and I/O units to a common main memory.
The primary object of the invention is to provide an improved control system for a directory-based cache coherency system;
A further object of the invention is to provide a directory-based cache coherency system that is capable of maintaining coherency in a system having both Instruction Processors (IPs) and Input/Output (I/O) units coupled to a shared main memory;
It is another object of the invention is to provide a coherency system capable of supporting a multiple expandable number of cache memories, and a multiple expandable number of I/O units;
A yet further object of the invention is to provide a directory-based coherency mechanism that allows IPs to cache data read from the shared main memory;
A still further object of the invention is to provide a directory-based coherency mechanism that allows an I/O unit to provide a copy of a data item to a coupled I/O sub-system while allowing the data item to remain stored in either an IP cache or a different I/O unit;
A still further object of the invention is to provide a cache coherency system that permits an I/O unit to maintain a copy of a data item within a buffer until a merge operation is completed and the I/O unit returns the data item to the main memory;
Another object of the invention is to provide a cache coherency system that maintains cache coherency when one or more I/O units overwrite data in the shared main memory system that is stored within one or more of the cache memories coupled to the shared main memory system;
A further object of the invention is to provide a cache coherency system that supports memory modifications of various block sizes within the shared main memory;
A still further object of the invention is to provide a coherency system that provides improved corruption detection for data stored within the shared main memory system.
The objectives of the present invention are achieved in a directory-based cache coherency system for use in a data processing system having multiple Instruction Processors (IP) and multiple Input/output (I/O) units coupled through a shared main memory. The system includes one or more IP cache memories, each coupled to one or more IPs and to the shared main memory for caching units of data referred to as xe2x80x9ccache linesxe2x80x9d from shared main memory. The system further includes one or more I/O memories within ones of the I/O units, each coupled to the shared main memory for storing cache lines. Coherency is maintained through the use of a central directory which maintains status on each of the cache lines in the system. The status indicates the identity of the IP caches and the I/O memories that store copies of a given cache line. The status further identifies a set of access privileges (for example, read or read/write privileges), referred to as a cache line xe2x80x9cstatexe2x80x9d that is associated with the given cache line.
An IP cache or I/O memory obtains a copy of a cache line and an associated set of access privileges by issuing one of a predetermined allowable set of commands to the shared main memory. A command may cause a requested cache line to transition to a new state. The command may also cause the requested cache line to be marked as invalid because the command was not considered a valid request based on the then-existing state of the requested cache line. The predetermined cache line states and the set of allowable commands therefore define a state machine which serves to ensure only valid copies of a cache line are maintained within the memory system.
The coherency system of the current invention allows copies of the cache lines stored in IP caches to be handled differently than some copies provided to I/O units. The central directory always tracks cache line data stored in IP caches. Furthermore, the shared main memory always requires that an IP cache return a modified copy of a cache line to the shared main memory when other requests are received for that cache line. In contrast, the central directory does not track cache lines that are provided to I/O units to be saved as a snap-shot copy on an I/O sub-unit such as a disk. Moreover, I/O units are allowed to retain data for write purposes until a write operation is completed. That is, the main memory does not force an I/O unit to relinquish a cache line because another unit is making a request for that cache line. By distinguishing between IP caches and I/O memories in this manner, main memory is not forced to perform data tracking or data retrieval functions that are not necessary, which improves memory efficiency.
The memory coherency system of the current invention further includes commands to allow the shared main memory to be modified on other than strictly cache line boundaries. The system is therefore capable of supporting multiple heterogeneous instruction processors, not all of which necessarily operate on blocks of cached data that are of the same size.