Not Applicable
Not Applicable
1. Technical Field
This invention relates in general to processing devices and, more particularly, to a cache architecture for a processing device.
2. Description of the Related Art
Most processing devices use a cache architecture to increase the speed of retrieving information from a main memory. A cache memory is a high speed memory that is situated between the processing core of a processing device and the main memory. The main memory is generally much larger than the cache, but also significantly slower. Each time the processing core requests information from the main memory, the cache controller checks the cache memory to determine whether the address being accessed is currently in the cache memory. If so, the information is retrieved from the faster cache memory instead of the slower main memory. If the information is not in the cache, the main memory is accessed, and the cache memory is updated with the information.
As processing cores increase in speed relative to memory designs, the efficiency of the cache architecture becomes more significant. One way to increase efficiency is to increase the size of the cache. Since a larger cache memory can store more information, the likelihood of a cache hit is similarly increased. In most cases, however, increasing cache size has diminishing returns after a certain point. Further, increasing the cache size will increase the size of the chip (assuming the cache is integrated with the processing core). Even more importantly, access time will be increased, defeating the initial purpose of the cache. Accordingly, merely increasing the size of a cache will in many cases not produce worthwhile results.
In many devices, certain routines will have critical time constraints or will otherwise need a predictable execution time. In these cases, it can be critical to eliminate latencies due to cache misses. Some cache systems provide mechanisms for locking entries in a cache, so that the cache entries will not be overwritten as other locations are accessed. This mechanism is useful for entries that will be used repeatedly; however, locking entries of a cache reduces the size and associativity of the cache. For instance, in a 2-way set associative cache, locking some entries will result in a portion of the cache acting as a direct map, greatly reducing the efficiency of the cache. A similar solution uses a local memory working in parallel with the cache system. This solution requires address decoding for the local memory and a cache disabling mechanism, which can result in latencies. Further, while an implementation with a local RAM may work with routines specifically written to use the local RAM, other routines, specifically OS (operating system) routines not written in anticipation of the specific local RAM configuration will not be able to control the local RAM in the manner that the cache is controlled.
Therefore, a need has arisen for a cache architecture that increases cache performance and predictability.
In a first embodiment of the present invention, a processing device comprises a processing core having circuitry for generating addresses to access a main memory and a cache. The cache comprises a data memory having a plurality of entries for storing information from the main memory, a tag register for storing address information defining a contiguous block of main memory addresses mapped to the data memory, a global valid bit to be set to either a valid state indicating that the data stored in the corresponding data memory is valid or an invalid state indicating that the data stored in the corresponding data memory is not valid, and logic for determining cache hits in the cache by based on a comparison of the address information in said tag register with corresponding bits from an address from the processing core and the state of said global valid bit.
In a second embodiment of the invention, associated with the address, a processing device comprises a processing core having circuitry for generating addresses to access a main memory and a cache. The cache comprises a data memory having a plurality of entries for storing information from the main memory, a tag register defining a contiguous block of main memory addresses mapped to the data memory, control circuitry for selectively filling the data memory either by set fill mode where all locations of the data memory are filled after setting the tag register or line-by-line fill mode where entries in the data memory are filled in response to an access by the processing core to a corresponding address sin the main memory, and logic for determining whether information accessed by the processing core is stored in the data memory.
The present invention provides significant advantages over the prior art. First, the RAM set cache (mapped to a contiguous block of main memory addresses) can significantly improve the operation of a processing device performing real-time operations, since a desired block of code can be stored in the RAM set cache for fast retrieval. Second, there is no extra penalty for accessing a larger data memory for a RAM set, cache as long as the access time of the RAM set is not bigger than the access time of the standard cache. Third, the addition of one or more RAM set caches can be provided with a minimal amount of circuitry over a conventional cache. Fourth, the RAM set caches can be configured in a very flexible manner with other caches, such as a set associative or direct map cache, as desired. Fifth, the RAM set cache provides advantages over a local RAM, because a separate mechanism for loading the data memory is not necessary for the RAM set cache and no specific address decoding in serial with the memory access time is required. Sixth, the cache can be controlled by the OS or other software in the same manner as an ordinary cachexe2x80x94loading, flushing, line invalidation, and so on, can be performed by the software without knowledge of the specific architecture of the cache, or with minor modifications to a driver for the OS.