Cache memory is used to optimize system performance by temporarily storing data in memory devices that allow for high speed access, in comparison to data retrieval from low speed memory, such as disks. Cache memory is used to mirror the data on the low speed memory so that each access to the data is effected as an access to the high speed cache memory, rather than a direct access to the low speed memory. The initial access to the data incurs the time lost to access the data from the low speed memory, but once the data is stored in the cache memory, multiple accesses to the data are via the high speed cache memory access. The cache memory is structured to mirror a block of memory, so that subsequent access to data in proximity to the initially accessed data is also via the high speed cache memory access. Cache memory is conventionally structured to provide access to multiple blocks of memory. As shown in FIG. 1, blocks C0, C1, C2, and C3 form discrete cache location areas within the overall cache memory 125.
FIG. 1 represents a conventional processing system with indexed cache memory. Blocks 110-117 represent sequential processes being applied to data, in a pipeline fashion, from a processing entity 105. At block 113, a data request 132 is initiated; the request can be either a read or a write. For ease of understanding, read access will be discussed herein; the principles discussed are applicable to write access as well, as would be evident to one of average skill in the art. As shown, block 113 initiates a data request 132, although the data requested is accessed by block 116, via an access command 134, and a data transfer 136. Such "look ahead" accesses are particularly well suited to cache memory access systems, because memories have inherent latency, and processes 114, 115, which do not use the requested data, may be performed while the data is being accessed. Non-pipelined processing may be represented by omitting blocks 114 and 115 and combining blocks 116 and 113. In such a system, efficiencies are achieved whenever the requested data is already present in the cache memory, but the time required to access the data from memory will be directly reflected in the system performance whenever a memory access is required.
Shown at the link between blocks 112 and 113 is a stream of memory access demands 150. This stream is intended to convey an example stream of requests for data within memory blocks A, J, P, C, F, J, H, etc. These requests enter the cache memory access system 120 at the cache control 160.
The cache control 160 assigns a cache location to each request. Conventional indexed cache memory access systems employ a straightforward mapping from a memory address to a cache location address, typically by assigning a portion of the memory address as the index to the cache memory. That is, for example, if the cache memory 125 consists of 16 parallel cache locations, each able to contain 1024 data elements, the lower 10 bits (2.sup.10 =1024) of the memory address will form the index to the data element within a cache location, and the next 4 bits (2.sup.4 =16) will identify an index to the particular cache location. The memory block 100 is shown organized as a 4 by 4 block structure, with memory blocks A, B, C, and D at row 0; E, F, G, H at row 1; etc. In this example, two bits of the memory address identify the cache location index: in FIG. 1, the row in which the memory block lies forms the location index to the cache memory 125.
Other cache assignment techniques are conventionally used, a common technique is one based upon cache latency. The assignment of a cache location to a new memory request is based upon whichever cache location has been idle the longest. Such a technique introduces additional complexities for maintaining a cache-idle record. To minimize this complexity, a combination of cache latency and cache indexing techniques is commonly employed.
With reference to the example stream 150 of FIG. 1, the first request, for block A, will be submitted to the memory 100, as a memory command 161, with instructions to store the block of data at A into cache location C0. The next request, for block J, will immediately follow this request, instructing the memory to place the block J into cache location C2. The next request, for block P, will immediately follow this request, instructing the memory to place the block P into cache location C3. These assignments are recorded in the cache table 170. Also contained in the cache table 170 is an "in-use" flag associated with each cache location. The in-use field is set when the data is requested, and cleared when the data access is completed. Initially, the in-use field for each cache location will be cleared.
As shown, upon receipt of the data from memory, cache location C0 will contain a copy of memory block A, identified as A' in FIG. 1; similarly, a copy of J, J', will be in cache location C2; and a copy of P, P', will be in cache location C3. The cache table 170 will also be utilized when the data access at process 116 is executed, for it is the cache table 170 which identifies where the data block is stored.
The next request, for block C, cannot be submitted to the memory 100, because cache location C0 is currently in-use, and it cannot be assured that the requested data at A will be removed from cache location C0 before the memory places C into the cache location C0. Thus, the process 113 must wait until the data access for A, at process 116, is completed before its request for C can be submitted. In most cases, this halt at 113 will force a halt in processes 112, 111, etc., thereby slowing the entire system. After the data access to cache location C0 is completed, as indicated by a cleared C0 in-use flag in the cache table 170, process 113's request for memory block C will be submitted by the controller 160, followed immediately by the request for memory block F, to be placed in cache location C1.
The submission of the next request, for memory block J, will be recognized by the cache controller 160 as being able to be satisfied by a cache memory access, because the cache index table 170 shows that cache location C2 contains the memory associated with J, in response to the first request for J, above. Thus, a memory request 161 for block J is not submitted to memory block 100 in response to the data request 132 for J from process 113. When this second data access to block J is executed at process 116, the cache table will still show J being assigned to cache location C2, where the copy of block J still resides. To assure that the first access to block J does not clear the in-use flag before the second access to block J occurs, a numeric variable is used as the in-use flag; this variable is incremented for each data request, and decremented after completion of each data access. A cache location is in use whenever the value of this flag is not zero.
The next request, for block H, will be submitted to the memory 100 only after the prior access to F is satisfied, and cache location C1 becomes is no longer in use.
When process 116 requests access to the data at a memory location, the cache controller 160 determines which cache location index is associated with the memory block containing the memory location. If the data has been received from the memory, via 101, in response to the previously submitted memory command 161, the data is communicated to the process 116 from the indexed cache location, via the path 126-136.
FIG. 2 shows a flowchart for a conventional cache memory access system. FIG. 2a shows a data request process, and 2b shows a data access process. The cache controller receives a request for data at a given memory address, at 200. The cache controller determines the index to the cache location associated with this memory address, at 210. It also determines whether the requested data is already located in the cache location, at 220. If it is not already in the cache location, a check is made as to whether that cache location is currently in use, at 230. This check is continually made until the cache location is not in use, as shown by the wait loop 240. If the cache location is free, the memory is accessed and the data is placed in the cache location, at 250. Note that step 250 can be a spawned process, so that the system can perform other tasks while the memory is providing the data to the indexed cache location.
When the data is in the cache location, either having been in the cache location as determined at 220, or having been accessed from memory at 250, the requesting process is provided access to the data as shown in the flowchart 2b. The request for access to data at a memory address is received, at 260. The cache location index is determined from the memory address, at 264, in the same manner as step 210 in FIG. 2a. The controller then provides access to the requested data in the indexed cache location, at 268.
To optimize the performance of a cache memory access system, the number of parallel cache locations, or cache lines, is determined in dependence upon the relative speed of access to the memory 100 and the expected nature of memory accesses. The pauses required whenever a memory request has the same index as a prior request to a different memory block occur due to latency uncertainties in the fetching of data from memory and the completion of data transfer to the using process. This uncertainty means that simple pipelining cannot be used to remove the pauses.
Using conventional queuing theory techniques, the appropriate tradeoff can be made between the costs of additional cache locations, the likelihood of causing a memory access halt, or pause, in the process, and the expected duration and impact of such a pause. Although additional cache locations will reduce the likelihood of pauses in the processing stream, conventional indexed cache memory access systems are still susceptible to certain patterns of memory access. For example, a conventional indexed cache memory access system will exhibit significant performance degradation if a number of memory requests having the same cache location index are receive in a row, regardless of the number of cache locations provided. Additional cache locations reduce the likelihood of the same cache location index assignment occurring within the memory access time, but the pause will occur whenever this event occurs.
Also, to minimize the complexity, and maximize the performance, of the cache controller, conventional indexed memory access systems use a subset of the memory address to determine the cache location index. This requires, however, that the number of cache locations is a power of 2, and improved performance requires successive doubling of the size, and cost, of cache memory. An incremental improvement cannot be effected, often forcing a design choice between insufficient cache memory and excessive cost.
Therefore, a need exists for a memory access system which is less susceptible to patterns of memory access, and a memory access system whose performance can be improved by an incremental addition of cache locations.