In computer systems, generally, a cache memory that is smaller in capacity and faster than the main memory is provided in addition to the main memory. Part of the information stored in the main memory is copied to the cache. When this information needs to be accessed, a faster retrieval of information is attained by reading the information from the cache rather than from the main memory.
The cache contains a plurality of cache lines, and the copying of information from the main memory to the cache is performed on a cache-line-specific basis. The memory space of the main memory is divided in units of cache lines. The divided memory segments are sequentially assigned to the cache lines. Since the volume of the cache is smaller than the volume of the main memory, the memory segments of the main memory are repeatedly assigned to the same cache lines.
When a first access is performed with respect to a given address in memory space, information (data or program) stored at this address is copied to a corresponding cache line provided in the cache. When a next access is performed with respect to the same address, the information is retrieved directly from the cache. In general, a predetermined number of lower-order bits of an address serve as an index for caching, and the remaining higher-order bits serve as a cache tag.
When data is to be accessed, the index portion of the address to be accessed is used to read the tag of a corresponding index provided in the cache. A check is then made as to whether the retrieved tag has a matching bit pattern with the tag portion of the address. If there is no match, a cache miss is detected. If there is a match, a cache hit is detected, resulting in the cache data (data of a predetermined number of bits equal in size to one cache line) corresponding to this index being accessed.
A cache configuration in which only one tag is provided for each cache line is called a direct mapping system. A cache configuration in which N tags are provided for each cache line is called an N-way set-associative system. The direct mapping system may be regarded as a one-way set-associative system.
A system which employs a cache memory hierarchy is used for the purpose of reducing a penalty associated with accessing the main memory when a cache miss occurs. A secondary cache that is accessible at faster speed than the main memory may be provided between the primary cache and the main memory. With this configuration, it is possible to lower the frequency of occurrences that the access to the main memory may become necessary upon the occurrence of a cache miss at the primary cache, thereby reducing a cache miss penalty.
Conventionally, increases in the processing speed of processors have been achieved by increasing their operating frequencies and by improving their architectures. In recent years, however, a technological limit to a further frequency increase has been beginning to be perceived. There has thus been a shift towards the use of a multiprocessor configuration using a plurality of processors for the purpose of increasing processing speed.
A system having a plurality of processors may be implemented by providing a plurality of conventional single processor cores each having a cache and by connecting these in a straightforward manner. While such a configuration may reduce design cost, there may be a problem in its cache utilization rate and cache consistency.
The applicant of the present application has provided a shared distributed cache mechanism as a solution to the above-noted problem. In this mechanism, each processor has a cache, and a given processor may utilize another processor's cache as a lower level cache situated under its own cache.
FIG. 1 is a drawing illustrating an example of the configuration of a shared distributed cache system. The shared distributed cache system illustrated in FIG. 1 includes a plurality of cores (i.e., processors) 11 through 13, a plurality of caches 14 through 16 provided in one-to-one correspondence to the cores 11 through 13, an inter-cache connection controller 17 connected to the caches 14 through 16, and a main memory 18. The cores 11 through 13 may access the caches (i.e., self-core caches) 14 through 16 connected directly thereto, respectively, as a primary cache. In this shared distributed cache system, provision is made such that another core's cache may be accessed as a secondary cache. Namely, when viewed from the core 10, the cache 14 is accessible as a primary cache, and the caches 15 and 16 are accessible as secondary caches. A path through which a secondary cache is accessed is provided through the inter-cache connection controller 17.
FIG. 2 is a flowchart illustrating a data load access operation performed in the shared distributed cache system illustrated in FIG. 1. In operation S1 of FIG. 2, one of the cores 11 through 13 issues a load request to the cache (i.e., self-core cache) directly connected thereto.
In operation S2, the cache that has received the load request checks whether the requested data resides in the cache, i.e., checks whether the access is a cache hit. In the case of a cache hit, the requested data is read from the self-core cache in operation S3 to be transmitted to the core that has issued the load request.
If a cache miss is detected in operation S2, the procedure goes to operation S4. In operation S4, a check is made as to whether a lower level cache exists under the cache for which the cache miss is detected. When the procedure goes to operation S4 upon detecting a cache miss in operation S2, the cache for which the cache miss is detected is one of the caches 14 through 16. In this case, thus, the two remaining caches may serve as lower level caches. If a lower level cache exists, the procedure goes to operation S5.
In operation S5, another core's cache that is situated in a next lower cache level is accessed. In operation S6, the cache that has received the access request checks whether the requested data resides in the cache, i.e., checks whether the access is a cache hit. In the case of a cache miss, the procedure goes back to operation S4 to repeat the subsequent operations.
If a cache hit is detected in operation S6, the procedure goes to operation S7. In operation S7, a check is made as to whether the cache data is to be moved. When the cache data is to be moved, an accessed cache line (i.e., accessed data) is moved in operation S8 from the cache-hit cache to the self-core cache, followed by transmitting the data to the core. In so doing, a cache line that is evicted from the self-core cache upon moving the accessed cache line to the self-core cache is moved to another core. If it is ascertained in operation S7 that the cache data is not to be moved, the accessed data is transmitted from the cache-hit cache to the core that has issued the load request.
If it is ascertained in operation S4 that a lower level cache does not exist, the procedure goes to operation S10. When the procedure goes to operation S4 upon detecting a cache miss in operation S6, the cache for which the cache miss is detected may be the lowest level cache. In this case, only the main memory 18 exists under this level. In such a case, the requested data is read from main memory 18 in operation S10 to be allocated to the self-core cache (i.e., the requested data corresponding to one cache line is copied to the self-core cache), followed by transmitting the data to the core that has issued the load request. A cache line that is evicted from the self-core cache as a result of this operation is moved to a lower level cache, for example.
In the operation flow described above, operations S7 through S9 relate to data transmission between caches, and operation S10 relates to data transmission between a memory and a cache.
In the operation performed in operation S10, the entries corresponding to an accessed index may be all used in the self-core cache. In such a case, one of these cache entries may need to be evicted in order to allocate the accessed data. This may require that a cache entry to be evicted be selected according to some criteria. Further, there may be a need to determine based on some criteria which entry of which cache is to be used as a destination of the cache entry that is evicted. There is also an option of discarding the cache entry that is evicted. These choices are preferably made so as to improve the cache utilization rate, and, also, the eviction and transmission operations are preferably performed in an efficient manner.    [Patent Document 1] Published Japanese Translation of PCT Application No. 2004-511840    [Patent Document 2] Japanese Patent Application Publication No. 59-003773