The present invention relates to methods and apparatus for providing a software implemented cache memory within a local memory of processor having access to an external system memory.
Real-time, multimedia applications are becoming increasingly important. These applications require extremely fast processing speeds, such as many thousands of megabits of data per second. While some processing systems employ a single processor to achieve fast processing speeds, others are implemented utilizing multi-processor architectures. In multi-processor systems, a plurality of sub-processors can operate in parallel (or at least in concert) to achieve desired processing results.
In recent years, there has been an insatiable desire for faster computer processing data throughputs because cutting-edge computer applications are becoming more and more complex, and are placing ever increasing demands on processing systems. Graphics applications are among those that place the highest demands on a processing system because they require such vast numbers of data accesses, data computations, and data manipulations in relatively short periods of time to achieve desirable visual results. Conventional processors have very rapid cycle times (i.e., the unit of time in which a microprocessor is capable of manipulating data), on the order of a nanosecond or less, although the time required to access data stored in main memory may be considerably higher than the cycle time of the microprocessor. For example, the access time required to obtain a byte of data from a main memory implemented using dynamic random access memory (DRAM) technology is on the order of about 100 nanoseconds.
In order to ameliorate the bottleneck imposed by the relatively long access time to DRAM memory, those skilled in the art have utilized cache memories. A cache memory is significantly faster than DRAM memory, and augments the function of data storage provided by the main memory. For example, an L2 cache memory may be coupled externally to the processor or an L1 cache memory may be coupled internally with the processor, which memories are significantly faster than a main (or system) memory implemented utilizing DRAM technology. An L2 cache memory may be implemented utilizing, for example, static random access memory (SRAM) technology, which is approximately two to three times faster than DRAM technology. An L1 cache memory is usually even faster than an L2 cache memory.
Due to the relatively high cost of cache memories, they are typically much smaller than main memory. Consequently, conventional algorithms have been employed to determine what data should be stored in the cache memory. These conventional algorithms may be based on, for example, the theoretical concept of “locality of reference,” which takes advantage of the fact that relatively small portions of a large executable program and associated data are used at any particular point in time. Thus, in accordance with the concept of locality of reference, only the small portions of the overall executable program are stored in cache memory at any particular point in time.
The particularities of the known algorithms for taking advantage of locality of reference, or any other concept, for controlling the storage of data in a cache memory are too numerous to present in this description. Suffice it to say, however, that not every algorithm is suitable in all applications as the data processing goals of various applications may differ significantly. Further, in situations where there is weak data storage locality and/or little sequential memory access (e.g., the portions of the program and data that are needed are randomly located to some extent), little advantage is obtained by using a cache memory architecture.
The conventional approach to cache memory implementation requires a hardware cache memory located on chip (L1 cache) or off chip (L2 cache), which are expensive and take up valuable space. A decision to employ a cache memory arrangement, therefore, should not be made without serious consideration. As there is no guarantee that a cache memory arrangement will yield advantageous performance in many instances, some processing systems do not employ them. Unfortunately, the decision not to employ a hardware implemented cache memory has the disadvantageous effect of limiting processing throughput in those situations where some degree of locality of reference exists.
Accordingly, there are needs in the art for new methods and apparatus for implementing a cache memory, which may exploit at least some of the advantages of a hardware implemented cache memory without incurring the disadvantageous cost implications in terms of expense and usage of space.