Not Applicable
Not Applicable
1. Technical Field
This invention relates in general to processing devices and, more particularly, to a cache architecture for a processing device.
2. Description of the Related Art
Most processing devices use a cache architecture to increase the speed of retrieving information from a main memory. A cache memory is a high speed memory that is situated between the processing core of a processing device and the main memory. The main memory is generally much larger than the cache, but also significantly slower. Each time the processing core requests information from the main memory, the cache controller checks the cache memory to determine whether the address being accessed is currently in the memory. If so, the information is retrieved from the faster cache memory instead of the slower main memory. If the information is not in the cache, the main memory is accessed, and the cache memory is updated with the information.
As processing cores increase in speed relative to memory designs, the efficiency of the cache architecture becomes more significant. One way to increase efficiency is to increase the size of the cache. Since a larger cache memory can store more information, the likelihood of a cache hit is similarly increased. In most cases, however, increasing cache size has diminishing returns after a certain point. Further, increasing the cache size will increase the size of the chip (assuming the cache is integrated with the processing core). Even more importantly, access time will be increased, defeating the initial purpose of the cache. Accordingly, merely increasing the size of a cache will in many cases not produce worthwhile results.
In many devices, certain routines will have critical time constraints or will otherwise need a predictable execution time. In these cases, it can be critical to eliminate latencies due to cache misses. Some cache systems provide mechanisms for locking entries in a cache, so that the cache entries will not be overwritten as other locations are accessed. This mechanism is useful for entries that will be used repeatedly; however, locking entries of a cache reduces the size and associativity of the cache. For instance, in a 2-way set associative cache, locking some entries will result in a portion of the cache acting as a direct map, greatly reducing the efficiency of the cache. A similar solution uses a local memory working in parallel with the cache system. This solution requires address decoding for the local memory and a cache disabling mechanism, which can result in latencies. Further, while an implementation with a local RAM may work with routines specifically written to use the local RAM, other routines, specifically OS (operating system) routines not written in anticipation of the specific local RAM configuration will not be able to control the local RAM in the manner that the cache is controlled.
Therefore, a need has arisen for a cache architecture that increases cache performance and predictability.
In a first embodiment of the present invention, a processing device comprising a processing core having circuitry for generating addresses to access a main memory and an n-way cache. The n-way cache comprises n data memories each having a plurality of entries for storing information from the main memory, one or more tag memories for storing address information identifying a main memory address associated with each of the entries in a corresponding data memory, a plurality of tag registers for storing address information defining a contiguous block of main memory addresses, each tag register associated with a corresponding data memory, and control circuitry for defining a cache association between each data memory and either a tag memory or a tag register and selectively accessing each data memories in response to an address from the processing core based on the cache association.
In a second embodiment of the present invention, a processing device comprises a processing core having circuitry for generating addresses to access a main memory, a first, n-way, cache subsystem, where n is greater than or equal to 1, and a second, m-way, cache system, where m is greater two or equal to 1. The first cache system comprises n data memories each having a plurality of entries for storing information from the main memory and n tag memories for storing address information identifying a main memory address associated with each of the entries in a corresponding one of the n data memories. The second cache subsystem comprises m data memories each having a plurality of entries for storing information from the main memory and m tag registers, each storing address information defining a contiguous block of main memory addresses mapped to a corresponding one of the m data memories. Logic determines cache hits in the first and second cache subsystems, where hits from the second cache subsystem have precedence over hits from the first subsystem.
The present invention provides significant advantages over the prior art. First, the RAM set cache (mapped to a contiguous block of main memory addresses) can significantly improve the operation of a processing device performing real-time operations, since a desired block of code can be stored in the RAM set cache for fast retrieval. Second, there is no extra penalty for accessing a larger data memory for a RAM set cache, as long as the access time of the RAM set is not bigger than the access time of the standard cache. Third, the addition of one or more RAM set caches can be provided with a minimal amount of circuitry over a conventional cache. Fourth, the RAM set caches can be configured in a very flexible manner with other caches, such as a set associative or direct map cache, as desired. Fifth, the RAM set cache provides advantages over a local RAM, because a separate mechanism for loading the data memory is not necessary for the RAM set cache and no specific address decoding in serial with the memory access time is required. Sixth, the cache can be controlled by the OS or other software in the same manner as an ordinary cachexe2x80x94loading, flushing, line invalidation, and so on, can be performed by the software without knowledge of the specific architecture of the cache, or with minor modifications to a driver for the OS.