A cache device is one effective means for shortening memory access time in a computer. A small-capacity and high-speed memory called a cache memory is added to a processors and data accessed once is stored in the memory. Thus, when the data is next accessed, the data can be given back at a high speed.
FIG. 1 illustrates a cache mechanism of a multiprocessor system. Processors 102-1, 102-2, to 102-n have cache devices 100-1, 100-2, to 100-n, respectively, and these cache devices are interconnected to a main memory 108 through an interconnecting network 106. In order that the cache device can work effectively, it is desirable to store data required by the processor in the cache as much as possible. In other words, if the data required by the processor is frequently not present in the cache, then the low-speed main memory must be accessed many times. Thus, the average memory access times is lowered. Particularly in a multiprocessor system, plural processors access the same memory; therefore, access-confliction is caused so that the average of the access speeds to the main memory is further lowered. For this reason, it is a very important theme in computer systems using the cache device to store the data required by the processor in the cache device.
In current cache devices, time locality and spatial locality of memory access are used to improve the ratio of hits on the cache devices. Time locality is the concept that data accessed once is likely to be accessed again soon, and is used in a manner, such as LRU, in which data accessed once is made not to be easily forced out, in the cache devices. Spatial locality is the concept that data near the data accessed once are likely to be accessed. This concept is used, in the cache devices, as shown by a cache line 111 of a cache memory 110 in FIG. 2. Namely, the concept is used in a such manner of storing, in a cache array 114 following an address 112, four block data including three block data (that is, block data 116-2 to 116-4 following an accessed block data 116-1); and managing concerned data in unit cache blocks. The spatial locality, which is different from the time locality, uses a method of taking, in advance, even data that is not actually required by the processor in the cache device. If this method is further developed, it becomes possible to use a method of storing, in advance, blocks that will be shortly required by the processor in the cache. This method is called pre-fetch. By using the pre-fetch, the ratio of hits on the cache device is further improved so that the access time to the memory can be shortened. This pre-fetch Is an effective manner not only for a single processor system but also for a multiprocessor system. In the multiprocessor system, however, a new problem of useless sharing arises.
In the cache system in a multiprocessor system, cache coherence is managed such that inconsistency between a cache device in one processor and a cache device in another processor is not caused. For example, as shown in FIG. 3A, data stored in a plurality of cache devices 100-1 to 100-n are shared. In the case that the processor 100-n performs writing on the shared data, the writing is performed after the processor 100-n informs the other cache devices 100-1 and 102 having the same data that the writing will be performed to make the present data in the cache devices invalid without fail, as shown in FIG. 3B. By the invalidation, the other cache devices can know that the data that they have are not newest. The method that all processors can read the newest data at the time of the reading in this manner is cache coherency management. In the pre-fetch, one cache device predicts data that will be requested before long and reads the data as well as data required by the processor. However, this prediction does not necessarily prove to be right. Thus, useless data may be read. Even in the case of a single processor system, useless reading by the pre-fetch causes a problem, such as useless traffic between the main memory and a cache. In the case of a multiprocessor system, not only the useless traffic but also useless sharing arises. In other words, data that is not shared in the methods of reading only required data may be shared by plural caches in the methods of using the pre-fetch. It is necessary that at the time of writing onto the shared data in the cache device, the cache device informs the other cache devices that the writing will be performed. As far as this processing of informing the other cache devices is not finished, any data cannot be renewed. Therefore, the writing on the shared data is heavy processing, that is, processing which requires much time in the cache device. As a result, in the pre-fetch in any multiprocessor system, the drawbacks of the useless traffic and the useless sharing cancel the advantage of the pre-fetch. Thus, the multiprocessor system does not exhibit superior performance. As described above, in conventional cache devices, pre-fetch, which improves the ratio of hits on the caches, also results in an increase in overhead at the time of writing due to useless sharing and an increase in data-transmission by the pre-fetch. As a result, a problem that the pre-fetch is not easily applied to multiprocessor systems arises.