The present invention relates to methods and apparatus for controlling hierarchical cache memories and, more particularly, to a control technique where storage of data into a cache line of a lower level cache memory is prohibited when such storage would overwrite data of that cache line that is already stored in a higher level cache memory.
In recent years, there has been an insatiable desire for faster computer processing data throughputs because cutting-edge computer applications are becoming more and more complex, and are placing ever increasing demands on microprocessing systems. Conventional microprocessing systems (which employ a microprocessor and an associated memory) have very rapid cycle times (i.e., the unit of time in which a microprocessor is capable of manipulating data), such as one nanosecond. The time required to access data stored in main memory, however, may be considerably longer than the cycle time of the microprocessor. For example, the access time required to obtain a byte of data from a main memory (implemented utilizing dynamic random access memory, DRAM, technology) is on the order of about 60 nanoseconds.
In order to ameliorate the bottleneck imposed by the relatively long access time of DRAM memory, those skilled in the art have utilized cache memories. A cache memory augments the main memory in order to improve the throughput of the system. While the main memory is often implemented utilizing relatively inexpensive, slow, DRAM memory technology, the cache memory is typically implemented utilizing more expensive, fast, static random access memory (SRAM) technology. Given that the cache memory is implemented utilizing a high-cost technology, it is usually of a much smaller size than the main memory.
Due to the relatively small size of cache memories, conventional algorithms have been employed to determine what data should be stored in the cache memory at various times during the operation of the microprocessing system. These conventional algorithms may be based on, for example, the theoretical concept of “locality of reference,” which takes advantage of the fact that relatively small portions of an executable program are used by the microprocessor at any particular point in time. Thus, in accordance with the concept of locality of reference, only small portions of the executable program are stored in cache memory at any particular point in time. These or other algorithms may also be employed to control the storage and retrieval of data (which may be used by the executable program) in the cache memory.
The particularities of the known algorithms for taking advantage of locality of reference, or any other concept, for controlling the storage of executable programs and/or data in a cache memory are too numerous to present in this description. Suffice it to say, however, that any given algorithm may not be suitable in all applications as the data processing goals of various applications may differ significantly.
In conventional algorithms for controlling a cache memory, the microprocessor provides data access requests to the cache memory. When the requested data are stored in the cache memory, a cache hit occurs and the microprocessor receives the data relatively quickly. When a data access request for the data cannot be satisfied by accessing the cache memory, i.e., when a cache miss occurs, it is desirable to execute a data refill sequence in which the data is obtained from main memory and stored in the cache memory.
The cache memory may be disposed “on-chip” with the microprocessor, which is called a level-one (L1) cache memory, or it can be disposed separate, or off-chip, from the microprocessor, which is called a level-two (L2) cache memory. L1 cache memories usually have a much faster access time than L2 cache memories. A combined L1, L2 cache memory system also may be formed where both an on-chip cache memory and an off-chip cache memory are employed, which is sometimes referred to as a hierarchical cache memory. In this configuration, when the microprocessor makes an access request for data, the L1 cache memory is accessed first to satisfy the request and, if it cannot be satisfied, the L2 cache memory is accessed. If an L2 cache memory miss occurs, then the main memory is accessed and the L1 and L2 cache memories are refilled.
In order to reduce conflict occurrences between the L1 cache memory and the L2 cache memory, and improve access efficiency, the L2 cache memory may be an N-way set associative memory having more way sets than the L1 cache memory. In accordance with conventional techniques, when the L2 cache memory is refilled (i.e., after an L2 cache memory miss has occurred) one cache line from among N cache lines of the L2 cache memory must be selected to receive the refill data. If one or more of the N cache lines contains invalid data, then the refill data is stored in one of those cache lines. If all N cache lines contain valid data, however, then a random selection technique is used, the well known Leased Recently Used (LRU) algorithm is employed, or any other algorithm is used to select a cache line to receive the refill data. In any case, if valid data is overwritten in a cache line of the L2 cache memory, and a copy of such valid data is also contained in a cache line of the L1 cache memory, then that cache line of the L1 cache memory must be invalidated in order to assure consistency between the L1 cache memory and the L2 cache memory.
Unfortunately, invalidating data in higher level cache memories, such as the L1 cache memory, as dictated by the conventional control technique results in an overall lower throughput for the microprocessing system. Indeed, use of, for example, the L1 cache memory would not be optimized if the data therein were unnecessarily invalidated. This may result in cached instructions or highly accessed data in a loop-body being unnecessarily invalidated, as often happens when a very large data array is accessed.
Accordingly, there are needs in the art for new methods and apparatus for controlling a cache memory, which may include an L1 cache memory, an L2 cache memory and/or further lower level cache memories, in order to improve memory efficiency, increase processing throughput and improve the quality of the overall data processing performed by the system.