In some multiprocessor systems, the system's processors have shared memory--- which means that the address spaces of the various processors overlap. If the processors utilize cache memories, it is possible for several copies of a particular block of memory to concurrently exist in the caches of different processors. Maintaining "cache coherence" means that whenever data is written into a specified location in a shared address space by one processor, the system determines whether that same memory location is stored in the caches of any other processors, and then updates or otherwise flags those caches. Numerous prior art articles have discussed various aspects of cache coherence. See for example, Thacker, Stewart, and Satterthwaite, Jr., "Firefly: A Multiprocessor Workstation," IEEE Transactions on Computers, Vol. 37, No. 8, pp. 909-920 (August 1988); and, Thacker, Charles P., "Cache Strategies for Shared-Memory Multiprocessors," New Frontiers in Computer Architecture Conference Proceedings, Citicorp/TTI (March, 1986), both of which are hereby incorporated by reference.
The present invention specifically concerns systems which use two-level caches. As CPU (central processing unit) speeds increase, more and more computers are using two-level caches. In a two-level cache the first level cache is small but very fast and is designed to supply most of the CPU's memory references at CPU speeds. Since in a high-performance CPU the speed disparity between the CPU and the main memory is very large (sometimes on the order of 100:1), a large but somewhat slower second level cache is placed between the first level cache and the main memory system. The first level cache usually contains a subset of the information in the second level cache.
FIG. 1 shows the basic architecture of a multiprocessor system 100. The system 100 has several CPUs 102, 104, 106, and an input/output processor 108, all of which are coupled to a large, but somewhat slow main memory 110 by a shared memory bus 112. Each CPU has a first level cache 120 which is small but very fast, and a second level cache 122 which is larger than the first level cache 120, but somewhat slower.
Using the so called "rule of 10", the first level cache 120 is typically about ten times as fast as the second cache 122, and about one tenth the size of the second cache 122. Similarly, the second level cache 122 is typically about ten times as fast as main memory 110, and about one tenth the size of main memory 110. Of course, these ratios are only ball park figures. Since caches and main memory tend to have sizes which are equal to a power of two (e.g., 16k or 64k bytes for a cache and 128 Meg for main memory), the actual ratios of cache and memory sizes will usually be factors of two (such as 4, 8, 16 or 32). For example, the first level cache 120 may have a size of 16k bytes with an access speed of 10 nanoseconds, the second level cache may have a size of 256k bytes with an access time of 100 nanoseconds, and main memory may have a size of 4,096k bytes with an access time of 500 nanoseconds.
Consider CPU 104 and its caches 120 and 122. When events elsewhere in the system (e.g., the i/o processor 108) cause entries in the second level cache 122 to be invalidated or updated with new data, it is necessary to determine whether or not these entries are also stored in the first level cache 120. Since the first level cache 120 is much smaller than the second level cache 122, it will usually be the case that the information is not present in the first level cache 120. I n addition, accessing the first level cache 120 to determine the presence or absence of a data entry interferes with accesses by the CPU 104 and reduces overall system performance.
The preset invention makes it possible for the second level cache's control logic to determine whether a datum exits in the first level cache so that useless accesses to the first level cache can be avoided.