Cache memory is used to optimize computer system performance by temporarily storing data in memory devices that allow for high speed access, in comparison to data retrieval from low speed memory such as disks or tapes. Cache memory is used to mirror the data in the low speed memory so that each access to the data is effected as an access to the high speed cache memory, thereby avoiding the latency associated with an access to the low speed memory. The initial access to the data incurs the latency time loss to access the data from the low speed memory, but once the data is stored in the cache, subsequent accesses to the data are via the high speed cache access. The cache is structured to mirror a block of memory, so that subsequent access to data in proximity to the initially accessed data is also via the high speed cache access. Cache memory is conventionally structured to provide access to multiple blocks of memory. As shown in FIG. 1, blocks C0, C1, C2, and C3 form cache memory areas within a cache memory 130.
FIG. 1 represents a conventional processing system with indexed cache memory. The client process 110 accesses data contained in the memory 100 via the memory access system 120. The client process 110 communicates a stream of data commands 115 via the command bus 112, and the data associated with the command stream 115 is communicated via the data bus 111.
The memory access system 120 contains a cache memory 130 partitioned into cache locations C0, C1, C2, and C3. Each of these cache locations is capable of storing a copy of a block of memory A, B, C, etc. of memory 100. The cache memory 130 has a speed of access, which is substantially greater than the speed of access of memory 100. By storing copies of the blocks of memory 100 in the cache memory 130, substantial access speed improvements can be achieved when multiple accesses to the data within a block occur.
The data commands from the client process 110 are received by the operation generator 140 within the memory access system 120. The client data commands direct a transfer of data to or from a memory address, such as a read or write of data, or a combination, such as read-modify-write of data. The operation generator 140 generates a series of commands applicable to the memory control 160 and the memory 100 to accomplish each client data command. The operation generator interprets the data command to determine which memory block A, B, C, etc. of memory 100 includes the requested memory address. It also determines whether a copy of the identified memory block is already contained in the cache memory 130. If the memory block is in the cache memory, the operation generator identifies which cache location C0, C1, etc. contains the copy of the memory block, and formulates a command to effect the data command with this identified cache location.
If the memory block is not contained in the cache memory, the operation generator allocates one of the cache locations to this memory block. Typically, the allocated cache location will have been allocated to another memory block prior to this data command. Therefore, the operation generator must determine whether some action must be taken with regard to the data currently stored in the identified cache location. If, for example, the copy of the data in the cache location had only been used for reading the data contained in a memory block, no action need be taken, and the new memory block data will merely overwrite the prior data. If, however, new data had been written to this cache location, intending to be written to the associated memory block, the copy of the data in the cache location must be written to the memory block before the new memory block data is read into this cache location. Thus, in this case, the operation generator will formulate a command to write the data in the cache location to its previously associated memory block, followed by the command to read the new memory block into this cache location. The command to write data from the cache location to the memory is termed a "flush" of the cache location; the command to read data into the cache location from the memory is termed a "fill" of the cache location.
When the cache memory is full and another request arrives, the operation generator allocates one of the cache locations to the new request. A variety of allocation algorithms can be applied to determine which cache location is to be reallocated, such as least recently used algorithms, indexed algorithms, and others. Before the operation generator reallocates one of the cache locations, it first determines that the data contained in the cache location is no longer needed. Typically, the data will be needed if it has been modified and the modifications have not been written back to memory. If the data has not been written back to the memory, the new data request cannot be processed in the cache location until the modified data has been written back to the memory. While this writing occurs, the processing of the data request is halted, which, depending on the nature of the data, may completely halt the processing of the computer system.
There are several techniques to minimize the occurrence of a processing halt. For example, in a pipeline process, memory access requests are provided a few stages ahead of when the data is needed. But, if the data is not available when it is to be processed, the process is halted until the data is available. By providing stages between the request and the data availability, the memory access system is provided time to obtain the data from the slower memory, and therefore, the likelihood of the client process having to be halted is reduced.
Another technique is to "spawn", as sub-processes, current and subsequent commands before they are completed. The asynchronous nature of spawned processes, however, requires control in the sequencing of the spawned commands. Consider, for example, a command to flush modified data, followed by a command to fill from the same memory block. If the fill and flush commands are processed asynchronously and in parallel, the fill may occur before the flush. If the fill occurs before the flush, the modified data in the cache location will be overwritten by the data filled from memory, and will be incorrect. To avoid the potential errors caused by spawned processes, the commands and data must be processed in a coherent manner.
A direct means of assuring data consistency is to force a strict ordering of the sequencing of commands, and precluding the execution of a command until the preceding command has been completed. This purely sequential processing, however, is inefficient, because not all commands are dependent upon one another. For example, if a read command follows a flush command, there is no need to delay the execution of the read command until the flush command completes.
The processing of commands, even with dependency checks, must still occur sequentially, to avoid to memory deadlocks. That is, for example, when all the cache location are allocated and a new read request arrives, the dependency check will hold the read pending until one of the cache locations is flushed. The flushing of a cache location is held pending until the completion of the read or write requests to this cache location. Unless tight controls are placed on the ordering and processing of read, write, fill, and flush operations, the flushing of a cache location can become dependent upon the completion of a read request which is pending dependent upon the completion of this flushing, thereby resulting in a deadlock situation, precluding subsequent processing.
In a conventional cache system of FIG. 1, the command buffer 170 is a first-in first-out (FIFO) buffer, thereby assuring the proper sequencing of commands, the dependency checks are applied to each command as it is removed from the FIFO. If a command is dependent on a preceding command, the memory controller merely waits until the preceding command completes before commencing the execution of the subsequent command.
The sequential nature of the FIFO command buffer 170, however, introduces performance penalties upon all the commands within the FIFO. That is, while the command buffer 170 pauses to wait for the completion of a command, none of the commands in the command buffer are being executed, even if they could be executed without affecting the data coherency. When these commands arrive at the output of the FIFO, they will be immediately executed, but in the meantime they have incurred the delay caused by the dependent previous commands. Additionally, as is evident in the above description, the specific sequencing of commands is determined by the sequence of the arriving commands from the client process. As such, sequences which may provide for a more optimal memory access cannot be realized.
Therefore, a need exists for a method and apparatus that maintains the necessary order and control on the processing of commands to assure data consistency, but does not impose a cumulative delay on the processing of commands, and does not cause memory access deadlocks. A need also exists for a method and apparatus, which allows for command processing optimization without regard to the particular ordering of commands from the client process.