1. Field of the Invention
The present invention is related to cache memories, and more particularly to a storage array tile that supports systolic movement within a storage array.
2. Description of Related Art
In present-day cache memory systems, there is a trade-off between the time required to access most-frequently-accessed values, and the number of such values available at the shortest access times. For example, in a traditional multi-level cache hierarchy, the level-one (L1) cache provides a uniform access time for a particular number of values, and control circuits and other algorithmic features of some systems operate to maintain the most-frequently-accessed values within the L1 cache. However, due to physical wiring constraints and the fact that electronic systems are limited by the propagation speed of electronic signals, the larger the L1 cache, the longer the (fixed) access time for the typical L1 cache. Similarly, as the size of the L1 cache is reduced in order to reduce the access time, the number of frequently-accessed values that are not stored in the L1 cache increases. The values not stored in the L1 cache are therefore stored in higher-order levels of the memory hierarchy (e.g., the L2 cache), which provides a much greater penalty in access time than that provided by the L1 cache, as the typical cache memory system is inclusive, that is, higher-order levels of the memory hierarchy contain all values stored in the next lower-order level. For practical purposes, a given higher-order cache memory is generally much larger than the cache memory of the next lower order, and given the propagation speed constraints mentioned above, e.g., RC wire delay and the eventual limitation of the inherent speed of electric field propagation in die interconnects, the higher-order cache is much slower, typically on the order of 10-100 times slower than the next lower-order cache memory.
Further, the typical cache control algorithm employed in such cache memory systems typically handles one outstanding request to a cache level at a time. If an access request “misses” a cache, the access is either stalled or fails and must therefore be retried by the source of the request (e.g., a next lower-order numbered cache level or a processor memory access logic in the case of an L1 cache miss). The request is propagated away from the processor toward a higher-order level of cache memory, but retrying requests later at the L1 level ensures that access to the cache is still provided for other instructions that can execute while a hardware thread dependent on the requested value is waiting for the request to succeed. The alternative of stalling the entire processor pipeline is available, but provides an even more severe performance penalty.
Finally, the organization of values in a cache memory hierarchy is typically imposed by control structures within the cache memory hierarchy, e.g., cache controllers, that measure access frequencies according to schemes such as least-recently-used (LRU) and organize the levels of cache to maintain the most-frequently accessed values in the lower-order caches using cast-out logic.
Solutions other than the traditional cache memories and hierarchy described above have been proposed that permit multiple requests to be pipelined, but require the imposition of fixed worst-case access latencies and buffering to control the flow of the pipelined information. Further, non-traditional cache memories have been proposed that have a non-uniform access latency and that are organized without using additional access measurement and cast-out logic, but generally only offer a small potential improvement over the operation of present cache memories by swapping cache entries to slowly migrate frequently accessed values to “closer” locations, while migrating less frequently used values to “farther” locations. Such non-uniform cache memories also require additional pathways to perform the swapping and are typically routed systems, in which switching circuits are used to perform selection of a particular cache bank.
The above-incorporated U.S. Patent applications describe a memory array, in particular a novel spiral cache memory in which multiple requests can concurrently flow through the memory array tiles, moving requested values to a front-most tile. While such operation can be supported by a global control logic, such logic defeats to some degree the advantages of an easily replicable and scalable tiled design. Further, timing of global control to local element access times and wire interconnects always presents a challenge.
Therefore, it would be desirable to provide a storage tile that can support movement of values within storage arrays such as those described in the above-incorporated U.S. Patent applications to provide a replicable and scalable design that requires little global control support.