In general, prior art cache coherency protocols can be divided into shared resource methods and directory 1 based protocols. In shared resource methods, coherency is maintained by ensuring that all address operations are visible to caches in the entire system. This is accomplished by using shared address buses and requiring inclusion in the lower level caches that attach to the shared address buses. The other approach is to maintain either global or distributed directories of the location of information throughout the system.
Traditional shared memory multiprocessing systems have been designed with a single shared address bus for all memory coherent requesters. The drawback of this approach is that the system throughput is limited by this single address bus. Other shared memory multiprocessor systems use a single switching unit, with similar limitations on address bandwidth.
U.S. Pat. No. 4,755,930 issued to Wilson, Jr. et. al., ("Wilson") discloses a method for hierarchical cache memory coherency, but requires that a cache write-through be used, i.e., every cache write operation updates the corresponding location in central memory. Wilson maintains coherency through requiring cache inclusion and requiring write-through of all cache writes to central memory. Requiring cache inclusion uses cache space inefficiently, and using write-through uses interconnection bandwidth inefficiently. Thus, a need exists for a coherency method that does not require inclusion or write-through. The present invention addresses such a need.
A review of directory-based approaches to cache coherency can be found in "Directory-Based Cache Coherence in Large-Scale Multiprocessors", by Chaiken, D.; Fields, C.; Kurihara, K.; and Agarwal, A., Computer Magazine, June 1990, pp. 49-58. These prior art directory schemes are classified into three categories: full-map directories, limited directories, and chained directories. A full map directory exists when one bit is present for every processor to identify if a cache line is held by that processor. A limited directory allows a limited number of processors to hold a given cache line. A Chained directory uses a linked list of the processors that hold a given cache line. A need exists for a method that does not require a directory entry for every processor, does not limit the number of cached copies, and does not have the latency or complexity associated with a linked list system. The present invention addresses such a need.
U.S. Pat. No. 5,297,269 issued to Donaldson et. al., discloses a multiprocessor cache coherency protocol that uses a global full-map directory scheme in central memory. All memory accesses would perform lookups in the single directory. One concern with such a single-directory system is that it may create a system bottleneck. A method without the limitation of a single directory is desired. The present invention addresses such a need.
U.S. Pat. No. 5,313,609 issued to Baylor et. al., discloses a coherency method that uses a global directory. This method has the same limitation as the Donaldson patent, in having limited access to the single directory. The single global directory also prevents scaling of the system, since the directory size and the access traffic to the directory would have to increase for a larger system. A method without a global directory is desired. The present invention addresses such a need.
The use of distributed directories avoids some of the limitations associated with a global directory. An example of the full-map distributed directory approach is described by D. Lenoski, J. Laudon, T. Joe, D. Nakahira, L. Stevens, A. Gupta, and J. Hennessy, in "The DASH Prototype: Logic Overhead and Performance," IEEE Transactions on Parallel and Distributed Systems, Vol.4, No. 1, January 1993, pp. 41-61 ("Lenoski"). Lenoski's system maintains a full-directory for all the processors, with one directory bit per cache line for every processor in the system. Thus, in this DASH system, every directory has a number of bit entries equal to the number of cache lines times the number of processors in the system. Thus a need exists for a cache coherency directory that uses a smaller directory size and does not require an entry for every processor in the system for every directory. The present invention addresses such a need.
A distributed directory system wherein the directories are distributed and associated with hierarchical switch elements that also contain caches is disclosed by H. Mizrahi, E. Baer, and J. Zahorjan, in "Extending the Memory Hierarchy into Multiprocessor Interconnection Networks: A Performance Analysis," 1989 International Conference on Parallel Processing, pp. I-41-I-50 ("Mizrahi"). Mizrahi's system arranges the caches in a tree structure and connects them to a global memory. The directories in Mizrahi's system record the switch port that provides a path to the desired cached copy. The limitation of the Mizrahi system is that only one single copy of each memory location is allowed to be present in the system at a time. Thus, Mizrahi completely avoids the difficulty of coherency enforcement. While the scheme of Mizrahi is simple in implementation, it may suffer performance problems where read information needs to be shared. A method is needed that overcomes this limitation of a single copy of each memory location. The present invention addresses such a need.
Omran and Aboelaze build on the work of Mizrahi and describe a multistage switch network with caching and directories in the switches ("Omran"). See R. Omran, and M. Aboelaze, "An Efficient Single Copy Cache Coherence Protocol for Multiprocessors with Multistage Interconnection Networks" in Proceedings of the 1994 Conference for Scalable High Performance Computing, pp. 1-8, May 1994. Omran also relies on a single copy requirement to prevent cache coherency problems. Cache inclusion is not required in the different level caches since only one copy is allowed in the system. Thus, a need still exists for a cache coherency protocol without a global directory that allows multiple copies in a switch-based network, but does not require a full directory of all processors in each switch directory. The present invention addresses such a need.
A brief survey of the state of the art of coherence methods for multistage interconnection networks is also discussed by Omran and Lee. See R. Omran, and D. Lee, "A Multi-cache Coherence Scheme for Shuffle-Exchange Network based Multiprocessors," Proceedings of the Fifth Symposium on the Frontiers of Massively Parallel Computation, February 1995 ("Omran and Lee"). The techniques described include the full-map, which provides a bit for each cache in the system, and which is not scalable. Omran and Lee conclude that when directory schemes are used in multistage interconnection networks, immense coherence traffic is generated, causing more contention and latency problems. Omran and Lee then propose a Shuffle Network topology allowing only a single copy. A solution is desired which does not have a single copy limitation and does not have the large amounts of coherence traffic that would typically be expected with such a system. The present invention addresses such a need.