1. Field of the Invention
The present invention relates, in general, to cache memory and methods for using cache memory, and, more particularly, to a method and system that caches using a fraction of a memory device.
2. Relevant Background
Data processing systems rely on a variety of data storage mechanisms for storing data and program code. Each storage mechanism has an associated latency that describes a delay incurred in writing data to and reading data from the storage device. Storage mechanisms range from low latency mechanisms such as static random access memory (SRAM) located physically near data processing elements to magnetic, optical and remote storage mechanisms with latencies that are several orders of magnitude larger than SRAM. Mass storage devices tend to have greater latency than working memory located physically and logically close to a data processor.
There is a continuous need for techniques that can enhance performance without significantly increasing the cost and complication of a design. Caching is one technique implemented to improve performance of data storage systems. Cache technology hides latency associated with mass storage such as magnetic and optical disk storage devices. Cache technology involves providing a quantity of relatively low latency memory that holds a copy of selected program information, memory addresses or data that is otherwise stored in a higher latency storage device. Cache technology takes advantage of principles of locality of reference, both spatial and temporal, often present in stored data to select what portions of the data are copied into the cache mechanisms. So long as a copy of the data needed by the processing element is in the cache, the data processor only sees the delay associated with low latency devices, greatly improving performance.
Many data processing systems, for example embedded systems, use a single physical memory device for all memory requirements of the systems. This is done because commercially available memory integrated circuits (ICs) have sufficient available memory capacity to provide all the functions. Using multiple chips would be inefficient. However, allocating fractional portions of a single memory device to these disparate functions is problematic.
Integrated circuit (IC) memory by nature implements storage capacity in binary-sized increments (e.g., 2.sup.16 =64 Kbit 2.sup.24 =16 Mbits). A particular problem exists in trying to allocate a portion of a memory IC as cache while reserving other portions for non-cache operations. By way of example, a disk drive uses memory to hold firmware tables and configuration information, but these require only a fraction of a conventional memory IC's capacity. The remaining memory capacity is desirably allocated to cache data from the slower magnetic or optical storage to improve disk access time. In the past it has been difficult to efficiently allocate only a fraction of an IC memory device to a cache.
Prior systems use a "segmented" memory architecture to allocate one or more segment(s) to caching. Each segment can be organized as a circular. Adaptive segmenting techniques enable the number and size of segments to be dynamically modified. These techniques enable the single memory device to be effectively shared between cache and non-cache uses. Unfortunately, segmented architectures require complex control logic to implement. Moreover, a segmented memory often results in poorer performance than traditional tag memory controlled cache architecture.
Tag memory controlled cache technology is largely developed for general purpose computer systems in which the memory mechanisms are implemented using multiple integrated circuit chips. Conventionally, a data address is spit into a tag portion and an index portion. The tag portion includes the most significant bits of the memory address and the index portion includes the least significant bits.
When a cache line or cache block (the smallest addressable portion of the cache) is filled with data, the index portion of the target address identifies one or more sets of cache blocks that are available to be filled. One cache block in the identified set is selected and the data is written into the selected cache block while the tag portion is written into a tag memory associated with the cache block. When data is required from memory, the index portion is used to identify one or more sets of cache blocks that may contain the data. The tag memory for the identified sets is searched to determine whether the matching tag value is stored therein. Upon a match, the data can be read out from the cache and main memory access is avoided.
The split of the target address between tag and index portions is in effect a binary division process, but requires no computational resources to achieve. The index created from the lower bits of the target address covers a binary-sized tag memory, and thus a binary-sized cache. When the memory system is implemented with a single IC, with a fraction unavailable, a binary-sized cache would be limited to a maximum of one half of the available storage. Typically this limitation wastes resources as the non-cache uses required much less than one half of the available memory space of a single IC. This limitation has prevented traditional tag-memory controlled data caches from being implemented in many sytems.
In general, generating tag and index values from a target address to access a cache should be computationally efficient. Because the cache is accessed continuously, any latency associated with generating address information has a significant cumulative effect. For this reason, tag and index generation should take as few clock cycles as possible.