1. Field of the Invention
This invention relates generally to minicomputing systems, and particularly to storage hierarchies having high speed low capacity storage devices coupled via a system bus to lower speed high capacity storage devices, and more particularly to a private CPU-Cache Memory Interface.
2. Description of the Prior Art
The storage hierarchy concept is based on the phenomenon that individual storage programs under execution exhibit the behavior that in a given period of time a localized area of memory receives a very high frequency of usage. Thus, a memory organization that provides a relatively small size buffer at the CPU interface and the various levels of increasing capacity slower storage can provide an effective access time that lies somewhere in between the range of the fastest and the slowest elements of the hierarchy and provides a large capacity memory system that is "transparent" to the software.
This invention takes advantage of a word organized memory. Prior art was limited to storing the requested data word with its address in hardware registers. When the need came about for expanded size low cost buffers, the prior art utilized a block organization. If a particular word was requested by the CPU, the block containing that word was stored in a high speed data buffer. This had the disadvantage of bringing into the high speed buffer words with a relatively low probability of usages. Assuming a four word block, if word 4 is requested, the entire block including words 1, 2 and 3 which have a relatively low probability of usage, are brought into the high speed buffer. To optimize the usage of the memory hierarchy, the operating system must organize memory in such a manner that software submodules and data blocks start with word 1 of the block. To overcome this difficulty, the prior art utilized a "block look ahead." When one block was in the high speed buffer, a decision was made during the processing of a data word in that block to bring the next block into the high speed buffer.
Some typical patents indicative of this philosophy are as follows:
U.S. Pat. No. 3,231,868 issued to L. Bloom, et al, entitled "Memory Arrangement for Electronic Data Processing System" discloses a "look aside" memory which stores a word in a register and its main memory address in an associated register. To improve performance, U.S. Pat. No. 3,588,829, issued to L. J. Boland, et al, discloses an eight-word block fetch to the high speed buffer from main memory if any word in the eight-word block is requested by the CPU.
An article by C. J. Conti, entitled "Concepts for Buffer Storage" published in the IEEE Computer Group News, March 1969, describes the transfer of 64-byte blocks as used on the IBM 360/85 when a particular byte of that block not currently in the buffer is requested. The IBM 360/85 is described generally on pages 2 through 30 of the IBM System Journal, Vol. 71, No. 1, 1968. U.S. Pat. No. 3,820,078 issued on Curley, et al, entitled "Multilevel Storage System Having a Buffer Store with Variable Mapping Modes" describes the transfer of blocks of 32 bytes or hold blocks of 16 bytes from main memory to the high speed buffer when a word (4 bytes) of the block or half-block is requested by the CPU. U.S. Pat. No. 3,896,419 issued to Lange, et al, entitled "Cache Memory Store in a Processor of a Data Processing System" describes the transfer of a four word block from main memory to the high speed buffer when a word of that block is requested by the CPU. U.S. Pat. No. 3,898,624 issued to Tobias entitled "Data Processing System with Variable Prefetch and Replacement Algorithms" describes the prefetching of the next line (32 bytes) from main memory to the high speed buffer when a specific byte is requested by the CPU of the previous line.
In minicomputers, particularly those minicomputers which are organized in such a fashion that a plurality of system units are connected in common to a system bus, the prior art systems present a number of problems all having to do with reducing the throughput of the minicomputer. The prior art sends back to cache from main memory, the entire block of words in which the requested word is located. This includes words with addresses preceding the requested word and words with addresses following the requested words. In most cases the CPU will require as the following cycle the word in the next higher address. This results in words with high probability of being used as well as words with lower probability of being used being transferred into cache. To overcome this problem, the prior art requires that the programmers on the operating system optimize their programs to start sequences off with words at the first address of each block. Another problem in the prior art is that a block of words transferred from main memory to cache comes over in successive cycles, for example, a 32 byte block may be transferred in 8 cycles, 4 bytes at a time. In the minicomputer bus architecture system this would greatly reduce the throughput of the system.
Still another problem in the minicomputer system utilizing a system bus and an I/O bus (input/output bus) type of architecture, is the increase in traffic over the system bus when CPU read requests have to be satisfied utilizing the system bus, because such increase in traffic further decreases the throughput of the minicomputer system.
What was needed, therefore, was a cache memory system which would not only provide for the greatest prbability of hits (i.e. finding the word resident in cache memory when a request is made by some unit) but will not increase traffic on the system bus in satisfying the various read or write requests in a computer architecture which utilizes a bus for interconnecting various components of the computer system.
Studies of memory access behavior during program execution indicates that over 90% of the accesses to memory were to read instructions or data and fewer than 10% of the accesses by the control processor were to write into memory. Furthermore, most programs contain execution loops in which a relatively small number of instruction and data locations are referenced interactively. Accordingly, depending on the program, between 80 and 95% of the total accesses could therefore be satisfied by reading from the cache. Accordingly, a direct private interface between the processor and cache and the use of high speed logic circuits therebetween not only could reduce the processor access wait time to a fraction of the access delay encountered when accessing main memory through the system bus, but reduces information transfer traffic on the bus. However, since it is desirable not to inhibit or slow down communications between other units connected to the bus and main memory, direct access to main memory by such other units including the CPU in preferable.
In the prior art there are innumerable devices where there are direct connections between CPU and cache memory. Some typical ones are disclosed in the following U.S. Pat. Nos. (1) 3,820,078 issued June 25, 1974; (2) 3,735,360 issued May 22, 1973; (3) 3,898,624 issued August 5, 1975; (4) 3,806,888 issued Apr. 23, 1974; and (5) 3,896,419 issued July 22, 1975. However, most of these arrangements do not provide for direct access of main memory by the CPU and none of the above provide communication between system components i.e. peripherals, controllers, main memory, CPU via a system bus to which they are connected.