In general, cache is used to duplicate a certain part of main memory, so that the duplicated part in the cache can be accessed by a processor core or central processing unit (CPU) core in a short amount of time and thus to ensure continued pipeline operation of the processor core.
Currently, cache addressing is based on the following ways. First, an index part of an address is used to read out a tag from a tag memory. At the same time, the index and an offset part of the address are used to read out contents from the cache. Further, the tag from the tag memory is compared with a tag part of the address. If the tag from the tag memory is the same as the tag part of the address, called a cache hit, the contents read out from the cache are valid. Otherwise, if the tag from the tag memory is not the same as the tag part of the address, called a cache miss, the contents read out from the cache are invalid. For multi-way set associative cache, the above operation is performed in parallel on each set to detect which way has a cache hit. Contents read out from the set with the cache hit are valid. If all sets experience cache misses, contents read out from any set are invalid. After a cache miss, cache control logic fills the cache with contents from lower level storage medium.
Cache miss can be divided into three types: compulsory miss, conflict miss, and capacity miss. Under existing cache structures, except a small amount of pre-fetched contents, compulsory miss is inevitable. But, the current pre-fetching operation carries a not-so-small penalty. Further, while multi-way set associative cache may help reduce conflict misses, the number of way set associative cannot exceed a certain number due to power and speed limitations (e.g., the set-associative cache structure requires that contents and tags from all cache sets addressed by the same index are read out and compared at the same time). Further, with the goal for cache memories to match the speed of the CPU core, it is difficult to increase cache capacity. Thus, multiple layers of cache are created, with a lower layer cache having a larger capacity but a slower speed than a higher layer cache.
Thus, current modern cache systems normally comprise multiple layers of cache in a multi-way set associative configuration. New cache structures such as victim cache, trace cache, and pre-fetching (putting the next cache block into a cache buffer while fetching a cache block or under a pre-fetch instruction) have been used to address certain shortcomings. However, with the widening gap between the speed of the processor and the speed of the memory, the existing cache architectures, especially with the various cache miss possibilities, are still a bottleneck in increasing the performance of modern processors or computing systems. In addition, current cache systems often do not consider data cache together with the instruction cache.
The disclosed methods and systems are directed to solve one or more problems set forth above and other problems.