Known multiprocessing systems comprise a plurality of processors and a hierarchy of storages including at least two levels of caches which are used as instruction and data buffers between the processors and the main memory. Each processor may have its private level 1 cache which is usually integrated in the processor chip, and a private level 2 cache on a separate chip. In such systems extensive communication between the private level 2 caches is required which burdens bus operations to a large extent.
Other multiprocessing systems use beside private level 1 a level 2 cache which is shared among a number of processors and the memory (U.S. Pat. No. 5,490,261). A number of CPUs each comprising a private L1 cache is assigned to a L2 cache using interleaves. Each CPU is associated with a request register. A L2 directory is connected to an input priority circuit which receives all requests of the CPUs for an access to the L2 cache. The priority circuit selects one request at a time for accessing the L2 cache directory. A high-order field in the selected request selects a row in the directory and a comparison with an address portion finds any assigned cache directory entry and associated cache data unit location. Each L2 directory entry contains a CPU identifier field which is set to a value that can identify one CPU as the current exclusive owner of the corresponding data unit in the L2 cache. The known system also uses queue and FIFO storages to store the incoming requests each of which contains both address and data until they are selected by the priority circuit. This also applies to store commands since cache store operations are not requests but handled as store commands.
Shared level 2 caches (or shared level 3 or level n caches) offer a number of advantages over private L2 caches. If seen by one of the processing units or CPUs, the cache size appears much larger at the same chip costs. Duplicate cache line entries and cross-cache communications are avoided and thus network traffic can be reduced. On the other hand, shared L2 caches have also some disadvantages including the need of arbitration among the processing units which request cache access. By this bottleneck the busy time increases and in turn the queuing time of the multiprocessing tasks increases. As a consequence, the processor performance and the MP-factor can decrease significantly.
An efficient measure to improve the MP-factor consists in splitting the L2 cache into independent units such as banks with separate address classes. Mak et al, "Shared-cache clusters in a system with a fully shared memory", IBM Journal of Research and Development, Vol. 41, No. 4/5, July/September 1997, pages 429-448, disclose a multiprocessing system using a shared L2 cache in a cluster design which comprises multiple shared cache clusters each supporting a number of microprocessors. In an implementation example, three processing units are assigned to a cluster of two independent L2 cache chips each having its own cache directory. Up to 12 processing units may use up to four L2 cache clusters. Processing unit interface controllers provide fetch and store requests to the L2 cache from three processing units, and a bus switch controller provides the interface to a shared L2,5 cache and the main memory as well as to the other shared L2 clusters to support the cross communication between the independent L2 units as described above.
It is an object of the invention to improve the efficiency of multiprocessing systems using shared level n caches.
According to another object of the invention the performance of shared level n cache design is increased by a higher degree of concurrency in the cache operations.
It is another object of the invention to perform three of the most often used cache operations concurrently.