1. Field of the Invention
The present invention relates to the field of computer systems. More particularly, the present invention relates to a cache memory controller and method for fetching data for a central processing unit (CPU) that reduces CPU idle time.
2. Art Background
Typically the central processing unit (CPU) in a computer system operates at a substantially faster speed than the main memory. In order to avoid having the CPU idle too often while waiting for data or instructions from the main memory, a cache memory which can operate at a higher speed than the main memory is often used to buffer data and instructions between the main memory and the CPU. The data and instructions in memory locations of the main memory are mapped into the cache memory in block frames. Each block frame comprises a plurality of block offsets corresponding to a plurality of memory locations storing a plurality of the data and instructions. To further improve the overall CPU performance, some computer systems employ separate cache memories, one for data and one for instructions.
However, the use of separate cache memories does not solve the problem entirely. When a cache read miss occurs, that is, when the datum or instruction requested by the CPU is not in the cache memory, the cache memory has to retrieve the datum or instruction from the main memory. To do so, typically the entire block frame of data or instructions comprising the requested datum or instruction is retrieved; and the CPU goes idle until the retrieval is completed. For other cache performance problems and improvement techniques, see J. L. Hennessy, and D. A. Patterson, Computer Architecture-A Quantitative Approach, pp. 454-461, (Morgan Kaufmann, 1990).
The amount of time it takes to fill the cache memory with a replacement block frame is dependent on the block size and the transfer rate of the cache memory-main memory hierarchy. For example, if the block size is eight (8) words and the speed of the main memory is two (2) block offsets per three (3) clock cycles, then it takes eleven (11) clock cycles to fill the cache memory with the replacement block frame. Reducing the block frame size or filling a partial block on a cache read miss does not necessarily reduce the CPU idle time, since it will increase the likelihood of future cache read misses.
Various techniques have been used to minimize the amount of CPU idle time waiting for the cache memory when cache read misses occur. One common practice is early restart, that is, as soon as the requested datum or instruction arrives, it is sent to the CPU without waiting for the writing of the entire block to be completed. Therefore, the CPU may resume its execution while the rest of the replacement block frame is being written.
A further refinement of the early restart technique is out of order fetch which is a request to the main memory to retrieve the requested datum or instruction first, skipping all the data or instructions before the requested datum or instruction in the replacement block frame. Like the early restart, the retrieved datum or instruction is sent to the CPU as soon as it is retrieved. Again, the CPU may resume its execution while the rest of the replacement block frame is being retrieved. After retrieving the requested datum or instruction, the main memory continues to retrieve the remaining data and instructions in the replacement block frame, starting with the data and instruction after the requested datum or instruction, and loops around to retrieve the previously skipped data or instructions at the beginning of the block frame, until the end of the block frame is reached. Thus, the CPU can resume execution as soon as the first datum or instruction is retrieved from the main memory.
However, because handling subsequent requests from the CPU while trying to fill the rest of the replacement block frame gets complicated quickly, the CPU typically goes idle again after the datum or instruction is executed, and waits for the remaining retrievals to be completed. The CPU goes idle and waits, even if the datum instruction subsequently requested by the CPU is already in the cache memory or is part of the replacement block frame currently being retrieved. Thus, the benefits from early restart and out of order fetch are limited, especially if the CPU is likely to complete its execution before the rest of the replacement block frame is written to the cache memory. This is especially likely to occur on computer systems where the number of clock cycles required to execute a typical instruction is small, for example, reduced instruction set computer (RISC) systems.
Since the cache memory typically operates at a higher speed than the main memory, there are dead cycles where the cache memory is waiting for data or instructions to be transferred from the main memory, while the CPU is waiting for the cache memory. The number of dead cycles is also dependent on the block frame size and the transfer rate of the cache memory-main memory hierarchy. In the example discussed above, there are three (3) dead cycles per writing of a block frame, one in every three clock cycles. Therefore, subsequent requests for data or instructions that are in the cache memory can be satisfied during these dead cycles, thereby further reducing CPU idle time. The problem is knowing that the data or instructions are in the cache memory and synchronizing their read out from the cache memory to these dead cycles, without substantial investment in additional hardware.
Likewise, to satisfy the subsequent requests for data or instructions that are in the process of being retrieved from the main memory, problem is knowing when the data or instructions are retrieved and synchronizing their direct transfer to the CPU with their retrieval, without substantial investment in additional hardware.
As will be described, the present invention overcomes the disadvantages of the prior art, and provides a cache memory controller and method for fetching data for a CPU that reduces CPU idle time.