In order to increase the speed of processing within a microprocessor (also referred to herein as a CPU (central processing unit)), designers are implementing buffers and/or caches within the microprocessor chip (integrated circuit) in order to compensate for the speed differential between main memory access time and processor logic. Processor logic is generally faster than main memory access time with the result that processing speed is mostly limited by the speed of main memory. A technique used to compensate for the mismatch in operating speeds is to employ an extremely fast, small memory between the CPU and main memory whose access time is close to processor logic propagation delays. This small memory is used to store segments of programs currently being executed in the CPU and/or temporary data frequently needed in the present calculations. By making programs (instructions) and data available at a rapid rate, it is possible to increase the performance of the processor.
Analysis of a large number of typical programs has shown that the references to memory at any given interval of time tend to be confined within a few localized areas in memory. This phenomenon is sometimes referred to as the property of "locality of reference." The reason for this property may be understood considering that a typical computer program flows in a straightline fashion with program loops and subroutine calls encountered frequently. When a program loop is executed, the CPU repeatedly refers to the set of instructions in memory that constitute the loop. Every time a given subroutine is called, its set of instructions are fetched from memory. Thus, loops and subroutines tend to localize the reference to memory for fetching instructions.
If the active portions of the program and/or data are placed in a fast small memory, the average memory access time can be reduced, thus reducing the total execution time of the program. Such a fast small memory may be a cache memory or a buffer. Such a cache or buffer memory has an access time that is less than the access time of main memory, often by a factor of 5 to 10.
The fundamental idea of such a cache or buffer organization is that by keeping the most frequently accessed instructions and/or data in this fast cache or buffer, the average memory access time will approach the access time of the cache or buffer.
The basic operation of such a cache or buffer is as follows. When the CPU needs to access an instruction or data, the cache or buffer is examined. If the instruction or data word is found in the cache or buffer, it is read by the CPU. If the word addressed by the CPU is not found in the cache or buffer, the main memory is accessed to read the word. A block of words containing the one just accessed is then transferred from main memory to the cache or buffer memory. In this manner, some data is transferred to the cache or buffer so that future references to memory find the required words in the cache or buffer.
The average memory access time of the computer system can be improved considerably by the use of the cache or buffer. The performance of cache or buffer memory is frequently measured in terms of a quantity called "hit ratio." When the C1PU refers to memory and finds the word in the cache or buffer, it is said to produce a "hit." If the word is not found in the cache or buffer, it counts as a "miss." If the hit ratio is high enough so that most of the time the CPU accesses the cache or buffer instead of main memory, the average access time is closer to the access time of the cache or buffer memory. For example, a computer with a cache or buffer access time of 100 nanoseconds, a main memory access time of 1,000 nanoseconds, and a hit ratio of 0.9 produces an average access time of 200 nanoseconds. This is a considerable improvement over a similar computer with a cache or buffer memory whose access time is 1,000 nanoseconds.
One of the problems associated with the foregoing technology occurs during the fetching of the additional data associated with the word accessed by the CPU. It is possible that during the fetch of the remaining portion of the block of data the CPU will issue a request for an instruction or data having an address that is not contained within the block of data being fetched. With prior art implementations, the CPU must wait until the block of data has been completely fetched into the cache or buffer memory. This delays the provision of the instruction or data pertaining to the requested address to the CPU for several cycles.
Thus, there is a need in the art for a system and method for improving the speed and efficiency of the fetching process within a data processing system.