The present invention relates to a multiprocessor system in which a plurality of processors share cache memory in the n-way set associative method (where, n is 2 or larger), and more particularly to a replacement control technology that is applicable when a miss-hit occurs.
Conventionally, a multiprocessor system composed of a plurality of processors has been used to increase the processing performance of an information processing system. The advanced technology also makes it possible to provide an on-chip multiprocessor system where a multiprocessor system is built on one chip. In the field of a multiprocessor system or an on-chip multiprocessor system, considerable study has been made on the cache configuration and many inventions have been made for it to increase the hit ratio of the cache memory and to reduce the number of cycles required for data transfer from the cache memory to a register.
There are two cache configuration methods: one is a private cache method in which each of a plurality of processors has its own cache memory and the other is a shared cache method in which a plurality of processors shares a cache memory. Compared with the private cache method, the shared cache method requires fewer memory devices. Therefore, the shared cache method is advantageous to an on-chip multiprocessor system with a limited chip dimension or to a case in which compactness and low-cost are important.
With reference to the drawings, the operation of a conventional standard multiprocessor system using a shared cache method will be described.
Referring to FIG. 7, a multiprocessor system with the shared cache includes processors P0 and P1, a cache controller CC, a cache memory BS, and a main memory MS. The cache memory BS is the 4-way set-associative. When a miss-hit occurs, the LRU method is used to replace a block with another.
For example, assume that the processor P0 sequentially outputs addresses in blocks A and B and, then, the processor P1 sequentially outputs addresses in blocks C, D and E. After that, assume that the processor P0 sequentially outputs addresses in blocks A and B again. Note that blocks A-E are all in the same set i and that the cache memory BS is in the initial state.
When the processor P0 outputs addresses in blocks A and B and the processor P1 outputs addresses in blocks C and D, the copies of blocks A, B, C and D are stored in ways 0, 1, 2, and 3 of set i of the cache memory BS, respectively (FIG. 8a).
After that, when the processor P1 outputs an address in block E, a miss-hit occurs and the copy of block A stored in the least-recently referenced way 0 is replaced by the copy of block E (FIG. 8b). In addition, when the processor P0 outputs an address in block A, a miss-hit occurs and the copy of block B stored in the least-recently referenced way 1 is replaced by the copy of block A (FIG. 8c). Finally, when the processor P0 outputs an address in block B, a miss-hit occurs and the copy of block C stored in the least-recently referenced way 2 is replaced by the copy of block B (FIG. 8d).
In the prior art described above, when a plurality of processors P0 and P1 access many blocks (more blocks than the number of ways) in the same set i, the copy of a block accessed by one processor is sometimes replaced by the copy of another block accessed by the other processor with the result that a mist-hit (conflict-miss between processors) occurs. In the example shown in FIG. 8, the miss-hits shown in FIGS. 8c and 8d are those miss-hits that are caused by the replacement of the copy of block A, stored in way 0 in FIG. 8b. The problem is that a conflict-miss between processors, if generated, replaces data and slows the processing speed of the multiprocessor system.