(1) Field of the Invention
The present invention relates to a multiprocessing apparatus, and particularly to a technology that is effective when applied to the case where an arbitrary local cache is used out of those of plural processors in a semiconductor chip.
(2) Description of the Related Art
In a conventional symmetric multiprocessor, the respective local caches of processors are connected via a shared bus that is connected to a shared memory. According to a typical cache control method for such conventional symmetric multiprocessor, even when one CPU performs a cache refill to its local cache, another CPU performs no cache refill to its local cache. This is because each local cache performs a cache-to-cache data transfer only for the purpose of maintaining data coherence among the caches, and there is therefore a possibility that unnecessary data is stored into the cache in the case where a cache refill is performed by such another CPU. Despite this, under the management of a typical OS that supports multiprocessors, such as Linux, task scheduling is performed on the assumption that each task is executed on an arbitrary CPU. In other words, since even when one CPU performs a cache refill, another CPU performs no cache refill, and thus a cache miss occurs at a point on time when a task is assigned to such another CPU in task scheduling, although the same cache access would result in a cache hit in the case of the uniprocessor architecture.
The following should be referred to as documents disclosing technologies related to the present invention:
Japanese Laid-Open Patent Application No. S63-240649 (FIG. 1);
Japanese Laid-Open Patent Application No. H05-197622 (FIG. 1); and
John L. Hennessy & David A. Patterson “Computer Architecture A Quantitative Approach Third Edition” Chapter Six Multiprocessors and Thread-Level Parallelism Snooping Protocols [P.551].
However, a typical multiprocessor snoopy cache system is considered to be inferior to the uniprocessor cache system in terms of local characteristics. This is because, in terms of hardware control, while target data is stored into the local cache of a CPU when such CPU wishes to access the data, the same data is not stored into the local cache of another CPU. In contrast, in terms of software control (e.g. Linux), tasks are assigned to CPUs typically on a dynamic basis. In other words, there is a possibility that one task is executed by the number of times or more corresponding to the number of CPUs during a period from the beginning of the generation of a new task to its completion. Since there might occur, by the equivalent number of times, refill requests to the external memory as well as penalties attributable to inter-cache data sharing, it is estimated that the cache miss occurrence ratio of a multiprocessor attributable to cache locality is higher than that of a uniprocessor.
However, since the multiprocessor system adopts a method for reducing the number of cache miss penalties by causing the respective local caches to perform cache-to-cache data transfers and cache refills as much as possible even under the above circumstances, no serious problem occurs that is attributable to cache-to-cache data transfer in the case where the multiprocessor has two CPUs and where the number of penalty cycles in cache-to-cache data transfer is the same as the number of instruction execution cycles that is required at the time of local cache hit.
As is obvious from the above, it is required to improve cache locality in the multiprocessor architecture.