During the development of computer technologies, a main memory (i.e., physical memory or memory as generally referred to) stores and retrieves data at a speed always much slower than that of a CPU, thereby not maximizing the high processing power of a CPU. In order to buffer the ill-matched processing speeds of a CPU and a main memory, a high speed cache (i.e., cache as generally referred to) can be introduced therebetween.
Although the storage capacity of a cache tends to be much smaller than that of a main memory, it is capable of storing and retrieving data at a speed matching that of a CPU. According to the principle of locality of programs, the instructions and data currently accessed by a CPU are likely to be accessed multiple times afterwards, and the same holds true for the neighboring memory sections. Therefore, computer hardware usually is designed to automatically load data in memory sections related to the data that are currently accessed by a CPU into a cache such that, when the CPU is to access data in the memory, it accesses the cache first. If there is a cache miss, the CPU then accesses the memory. Using this approach, a CPU's direct access to the memory can be reduced to a maximal extent, thereby enhancing the overall processing speed of a computer system.
In recent years, computer systems have evolved in the direction of multi-core processors. A typical architecture of such a system is shown in FIG. 1. Computer system 10 includes a plurality of CPU cores 12_0, 12_1, . . . , 12_n−1, where n is a natural number greater than or equal to 1. Each core stores data to be accessed in its respectively caches, for example, L1 caches 14_1, 14_2, 14_3, . . . , 14_n and L2 caches 16_1, 16_2, 16_3, . . . , 16_n, for the purpose of speeding up CPU core processing. Because any data might be stored in this manner by private caches of multiple cores, a shared cache is introduced to reduce cache redundancy. Usually, the system 10 also includes a cache index 22 and a last level of cache (LLC) 18, which is shared by all the cores before accessing a memory controller 20 in the system. Such a shared cache can provide for sharing data amongst multiple cores, reducing communication delay, at the same time reducing data redundancy, enhancing cache space usage efficiency.
Due to the fact that private caches lead to caching multiple copies of a same data, in order to assure data consistency, present techniques usually utilize an index based consistency protocol. For example, a cache index can be utilized to track data in different private caches of different cores, recording which private caches have a copy of which data, and it is based upon such a cache index data read and data write operations will be executed. For example, when a CPU core needs to write data to its corresponding private cache, it first consults the cache index with regard to which other private caches also store a copy of the data, notifies those privates caches to set the corresponding data as NULL, then executes the data write operation so as to assure cache data consistency.
However, there are problems with the above described system and the cache index based consistency protocol. First, due the cache index keeping track of data at the private caches of different CPUs, increased number of CPU cores results in increased size of a cache index, occupying more cache space and severely impacting the expandability of the multitude of the cores of a processor.
Second, there is unavoidable conflict between private caches and shared caches. Private caches having multiple copies of same data leads to decreased efficiency of cache usage. Although shared caches can reduce data redundancy and increase cache space usage, with an increasing number of cores and a LLC's inter-connection with every CPU core, hardware related latency inherent to the LLC design will increase, causing extended cache latency.
Lastly, the need of an index based data consistency protocol to analyze the private caches of all the CPU cores leads to constrained data read and data write operation. For example, every data read operation has to visit the LLC and cache index in order to assure that the presently accessed data is consistent with the copies in the private caches of other CPU cores, resulting in decreased data accessibility.