1. Field of the Invention
The present invention relates to method and apparatus for managing the memory in a computer. More specifically, the present invention relates to method and apparatus for switching the context of state elements in a very fast processor whenever there is a cache miss.
2. Art Background
It is quite common for a fast central processor unit to have a cache memory in addition to a main computer memory. The cache memory, typically smaller but much faster than the main computer memory, is placed between the the processor and the main memory. During the execution of a software program, the cache memory stores the most frequently utilized instructions and data. Whenever the processor needs to access information from memory, the processor examines the cache first before accessing the main computer memory. A cache miss occurs if the processor cannot find the instruction or data in the cache memory and is required to access the slower main memory. Thus, the cache memory reduces the average memory access time of a processor.
An instruction stream is a sequence of instructions executed by the processor to accomplish a given process, such as add or divide. To date, a processor does either one of the following two things when it encounters a cache miss in an instruction stream: (1) it stays idle until the instruction or data access to the main memory completes, or (2) it executes other instructions in the stream out of order. These two approaches are acceptable as long as they produce no substantial increase in the physical size (real estate) of the processor, and the penalty for cache misses does not overwhelm the average instruction cycle of the processor. Cache memory is typically 32K to 1M byte in size, and hence does not occupy significant space on a processor chip or a processor board.
However, the faster the processors become, the heavier the penalty for cache misses. The penalty for cache misses refers to the length of time a processor takes to retrieve needed information from the main memory upon occurrence of a cache miss. In a typical 40 MHz microprocessor capable of executing 40 million instructions per second (MIPS), the penalty for every cache miss is twenty (20) clock cycles. Assuming a 1% cache miss rate for the cache memory and further assuming one clock cycle per instruction for the very fast processor, the average number of clock cycles per instruction of these processors would only be 0.01 (20)+0.99 (1)=1.19 instead of 1.0 because of cache miss penalty. As such, the processors merely deliver 40/1.19=33.6 MIPS.
It is quite common to pipeline the instruction and memory operations of fast processors. A pipeline refers to a processor's ability to perform multiple tasks concurrently in the same clock cycle. Just as cache misses slow down the raw processing speed of fast processors, they also create bottlenecks in the pipelines of the processors.
The adverse effects of the cache miss penalty and pipeline bottlenecks on the speed of very fast processors render secondary the previous concern of minimizing the physical size of a processor. The trade-off is inevitable in view of the dramatic increase in speed of processors and the popularity of pipeline processing.
As will be described more fully below, the present invention provides method and apparatus for switching the context of state elements in a very fast processor upon occurrence of a cache miss. The present invention saves the state of a first process upon a cache miss and permits the processor to begin executing a second process within one clock cycle. Should the second process encounters another cache miss, the processor may return within one clock cycle to finish executing the first process if the necessary data had been retrieved from the main memory. Otherwise, the processor may begin executing a third process. It is understood that the number of processes whose states are duplicated may easily be a large number n.