It has been shown that the use of a small high-speed memory, often called a cache memory, positioned between an instruction processor and a much slower main memory tends to enhance performance of a data processing system. Instructions and data resident within the cache memory at the time of a requested access by the instruction processor are furnished much more quickly than those instructions and data that must be obtained from main memory.
To obtain maximum benefit from the use of a cache memory, it is desirable to anticipate which memory locations will be accessed by the instruction processor so that they may be preloaded into the cache memory. U.S. Pat. No. 3,806,888 issued to Brickman et al, shows an early data processing system employing a cache memory residing between the main memory and the instruction processor, or central processing unit. In this system, real memory is segmented into blocks or pages. If the instruction processor requests access to one data element of a block, the entire block is automatically transferred to the cache memory for subsequent use by the instruction processor. U.S. Pat. No. 4,225,922 issued to Porter attempts to improve upon the basic cache approach by segmenting the cache memory and by buffering cache commands. U.S. Pat. No. 4,354,232 issued to Ryan also buffers cache control commands.
The pre-fetching of data may be further complicated by variable length elements. U.S. Pat. No. 4,189,772 issued to Liptay attempts to address this problem by buffering the input to the cache memory. A decoder element is added by U.S. Pat. No. 4,437,149 issued to Pomerene et al, between the main memory and the cache memory to partially decode instructions before the cache memory is loaded.
The cache memory and cache controller are placed on the same substrate in U.S. Pat. No. 5,025,366 issued to Baror. U.S. Pat. No. 4,905,188 issued to Chuang et al, describes a chip design for optimization of the hardware construction.
A multiprocessor system is shown in U.S. Pat. No. 5,023,776 issued to Gregor. The individual instruction processors have dedicated cache memories. Shared cache memories are interposed between the dedicated cache memories and the main memory. Write buffers are employed in parallel with the shared caches. Multiple sequential writes bypass the shared cache and proceed directly to the main memory through the write buffers.
U.S. Pat. No. 5,423,016, issued to Tsuchiya et al., discloses a system that stores pre-fetched data elements in a block buffer before loading them into a cache memory. This load is accomplished beginning from the requested data element first such that as soon as the requested data element has been loaded into the block buffer, it is made available to the instruction processor. In this way, the instruction processor is permitted to execute subsequent instructions from cache memory, in parallel with loading of the remainder of the requested block of data into the block buffer. The instruction processor is able to obtain data from the cache memory because the cache memory is not busy storing the requested block.
U.S. Pat. No. 5,724,533 to Kuslak et al. discloses a method and apparatus for efficiently halting the operation of the instruction processor when a cache miss is detected. Generally, this is accomplished by preventing unwanted address incrementation of an instruction address pipeline and by providing a null instruction to an instruction pipeline when a cache miss is detected. The system is adapted to eliminate a recovery period after the cache miss is handled.
While the above-described systems undertake some of the performance issues associated with accessing cache memory, these systems do not address the unique problems associated with performing cache write operations in an efficient manner. To maintain memory coherency, write operations must generally be performed in an order in which associated instructions appear in the instruction stream. However, when one or more cache misses occur during sequential write operations, the data that is needed to allow the write operations to complete may not be loaded into the cache in an order that allows processing to continue in a manner that maintains memory coherency. This causes the processing of the write operations to stall until all memory data is loaded into the cache. As a result, system throughput is diminished.
Another issue that is not addressed by the foregoing systems involves the problem of diminished bandpass on the cache memory interfaces. In prior art systems, any pre-fetching of instructions and data is initiated over the primary address and data interfaces that are used to return instructions and data to requesting processor logic. As a result, the pre-fetch operations consume the interface bandwidth of the system and diminish throughput.
What is needed, therefore, is an improved system and method to perform pre-fetching of data within a data processing system in a manner that does not impact the bandpass of primary memory interfaces, and that addresses the unique problems associated with performing memory write operations.