The present embodiments relate to performing update operations to a memory and, more particularly, to performing large-scale update operations to a memory.
Data access speed is a crucial parameter in the performance of many digital systems, and in particular in systems such as digital signal processors (DSPs) which perform high-speed processing of real-time data. Many types of memories and memory access methods have been developed with the goal of improving memory access speed. While the access speed to individual memory locations has increased, writing large blocks of data to memory remains problematic. Power considerations limit the number of memory locations which can be updated simultaneously, so that writing to numerous locations in a memory requires many write cycles, and incurs a significant time overhead.
Memory caching is a widespread technique used to improve data access speed in computers and other digital systems. Cache memories are small, fast memories holding recently accessed data and instructions. Caching relies on a property of memory access known as temporal locality. Temporal locality states that information recently accessed from memory is likely to be accessed again soon, and that information close to that recently accessed is likely to be accessed soon. When an item stored in main memory is required, the processor first checks the cache to determine if the required data or instruction is there. If so, the data is loaded directly from the cache instead of from the slower main memory, with very little delay. Due to temporal locality a relatively small cache memory can significantly speed up memory accesses for most programs.
When new data is to be written to the cache, a decision is made using a cache mapping strategy to determine where within the cache to write the new data. There are currently three prevalent mapping strategies for cache memories: the direct mapped cache, the fully associative cache, and the n-way set associative cache. In the direct mapped cache, a portion of the main memory address of the data, known as the index, completely determines the location in which the data is cached. The remaining portion of the address, known as the tag, is stored in the cache along with the data. To check if required data is stored in the cached memory, the processor compares the main memory address of the required data to the main memory address of the cached data. As the skilled person will appreciate, the main memory address of the cached data is generally determined from the tag stored in the location required by the index of the required data. If a correspondence is found, the data is retrieved from the cache memory, and a main memory access is prevented. Otherwise, the data is accessed from the main memory. The drawback of the direct mapped cache is that the data replacement rate in the cache is generally high, since the way in which main memory data is cached is completely determined by the main memory address of the data. There is no leeway for alleviating contention for the same memory location by multiple data items, and for maintaining often-required data within the cache. The effectiveness of the cache is thus reduced.
The opposite policy is implemented by the fully associative cache, in which the cache is arranged in rows and cached information can be stored in any row. The fully associative cache alleviates the problem of contention for cache locations, since data need only be replaced when the whole cache is full. In the fully associative cache, however, when the processor checks the cache memory for required data, every row of the cache must be checked against the address of the data. To minimize the time required for this operation, all rows are checked in parallel, requiring a significant amount of extra hardware.
The n-way set associative cache memory is a compromise between the direct mapped cache and the fully associative cache. Like the direct mapped cache, in a set-associative cache the cache is arranged in rows and the index of the address is used to select a row of the cache memory. However, in the n-way set associative cache each row contains n separate ways. Each way can store the tag, data, and any indicators required for cache management and control. For example, each way typically contains a validity bit which indicates whether the way contains valid or invalid data. Thus, if a way containing invalid data happens to give a cache hit, the data will be recognized as invalid and ignored, and no processing error will occur. In an n-way set associative cache, the main memory address of the required data need only be checked against the address associated with the data in each of the n ways of the corresponding row, to determine if the data is cached. The n-way set associative cache reduces the data replacement rate (as compared to the direct mapped cache) because data in addresses corresponding to the row can be stored in any of the ways in the row that are still available. As a further possibility, if all of the ways in the row are full, but some contain data that is less likely to be needed, then the new data can replace the data least likely to be needed. Such a system requires only a moderate increase in hardware.
Cache memories store both memory data and control data. Reference is now made to FIG. 1, which shows an example of the data and control arrays for a 2-way set associative memory. The associative memory is organized into M rows, where the number of rows is determined by general hardware design considerations. Each of the M rows contains two ways, although in the general case of an n-way set associative memory, each row would have n ways. As described above, each way stores main memory data. Each way additionally stores control data including a tag (which together the row index determines the main memory address of the stored data in a given way), and indicator flags, such as the validity bit, lock bit, and any other required indicators. Additional control data may also be stored on a per index basis, or for a block of indices.
As with standard memories, cache memories suffer from speed problems when a large quantity of data is updated in the cache. In cache memories write accesses may be made to both memory data and control data. Cache management functions often require performing control operations on large portions of the cache. For example, during cache reset the data is invalidated for every way in the cache. The number of write cycles required to reset the cache depends on the write access width to the cache. If the write access width enables writing to an entire index in a single cycle, the number of write cycles required to reset the cache equals the number of indices in the cache memory. However, if the write access width is a single way, the number of write cycles required to invalidate an entire index equals the number of ways in the cache memory (which for an n-way set associative cache equals n times the number of indices).
Memory access speeds are a crucial factor in the performance of many systems. Modifying the data in a large number of memory locations is generally a time-consuming operation, since the number of memory locations which can be updated simultaneously is generally limited by power considerations and the data access width. In order to initialize a block of memory, for example, the required value is stored in each memory location in the block, which may require many write cycles. During the initialization process access to the memory is generally blocked, to ensure that no data is read from a memory location before its data has been brought up-to-date. The resulting delays are particularly problematic in cache memories, since updates to multiple data items occur relatively frequently.
There is thus a widely recognized need for, and it would be highly advantageous to have, a memory update technique devoid of the above limitations.