A present-day computer is made up of three primary sections. These sections comprise a central processing unit which processes the data through the computer, a memory unit which stores the data and which also stores the program for the central processing unit, and an input/output unit through which the data and a number of control signals pass.
Originally, the memory unit was a uniform array of storage devices which retained a multiplicity of binary 1's and 0's which made up the program and data. The present-day computer, however, usually has several types of memory units. For example, there are two major categories of computer memory.
The first major category is the main memory, which is usually composed of random access memory units (RAMs), read only memory units (ROMs), programmable read only memory units (PROMs), and the like. The central processing unit (CPU) can access the main memory several million times per second, with each access being anywhere from a single binary bit, up to 64 bits, or more. The total volume of the main memory in present-day computers extends from a few hundred bytes to several million bytes.
The second major category of the present-day computer memory is referred to as the mass storage unit. This unit is usually composed of one or more disc drives, but it may also include magnetic tapes, cassettes, floppy discs, drums, cartridges, paper tape, and the like. In any case, access to the mass storage unit is through the input/output section of the computer, and always takes much longer than access to the main memory. For example, a typical disc access might require 30 milliseconds, and a tape access might require several seconds. Also, mass storage access usually involves the transfer of hundreds of bits. Total volume of a mass memory system is usually from several hundred thousand bytes to several thousand million bytes.
Obviously, the potential of any computer is a function of its memory size. Accordingly, memory sizes are constantly being enlarged, and the designs for both main memory and mass memory components are presently advancing rapidly in the art. Likewise, the faster the central processing unit can access the memory the greater the potential power which is available. A major portion of recent development efforts has gone into increasing the speed of the central processing unit, and also into increasing the memory access speed, because a fast central processing unit is ineffective without a fast access to its memory.
Presently, there are memory units which meet nearly every individual requirement needed to optimize computer operation. The ideal memory unit contains a large number of bytes, and it is accessible in a small number of nanoseconds. Moreover, it is also inexpensive. There are inexpensive memory units presently available on the market that meet the individual requirements of size and accessibility, but none which combines both of them. Thus, compromises have been necessary in the prior art, and if one desired a large, fast memory array, then fast expensive units must be used. On the other hand, if a person desires an inexpensive, large memory, inexpensive units are available which may contain many bytes, but which have slower access speeds.
Over the years, due to the factors enumerated above, various techniques have been used to make the main memory appear to have faster access time. This has been accomplished in some prior art systems by the introduction of a small, fast access memory array (known as the "cache" memory) which is placed between the central processing unit and the main memory array. In such a system, each time the central processing unit accesses a word in the fast memory, it reduces the average memory access time. On the other hand, each time the central processing unit in the prior art system accesses a word in the main memory, it increases the average memory access time.
Accordingly, it has become essential to devise a system in which most accesses are made from the cache memory, rather than from the main memory. Different approaches have been used in the past in attempts to improve the ratio of average number of accesses to the cache memory to average number of accesses to the main memory. However, the prior art approach has been to rely upon the natural characteristics of some computer programs which tend to access the same word repeatedly. In the prior art systems, when a word is selected by the central processing unit from the main memory, that word is transferred to the cache memory, so that each subsequent selection of that word will be from the cache memory.
An important objective of the present invention is to provide a memory system which includes a small fast access cache memory interposed between the central processing unit and the main memory, and which is constructed so that the ratio of the average number of accesses to the cache memory as compared with the average number of accesses to the main memory is substantially improved over the prior art systems of the same general type.
As explained above, many computer programs will access the same word several times, and that fact has been used in the prior art in attempts to increase the overall apparent access speed of the computer. Unfortunately, some computer programs access most words only once, and the prior art systems are not effective with such programs. However, all computer programs are primarily sequential. That is, when a program word or data word is accessed, the probability is very high that the next higher or lower word in the main memory will be accessed next. For example, a sub-routine may be fifty bytes long, but it is stored together, in contiguous locations in the main memory. Similarly, a typical data record, such as a customer's account file, which may be two thousand bytes long, is also stored in a selected area of the main memory in contiguous locations.
In accordance with the concepts of the present invention, when the central processing unit accesses the first word of a sub-routine or data block from the main memory, not only is that word transferred to the cache memory, but the contiguous words in the main memory are also simultaneously transferred to the cache memory. This results in many future accesses by the central processing unit being achieved from the cache memory instead of from the main memory, which guarantees a very high ratio of fast accesses as compared with slow accesses.
To implement the function described in the preceding paragraph, a parallel connection between the main memory and the cache memory is required. For example, a 64-byte wide connection (512 bit lines) might be provided. Accordingly, every time the central processing unit has to access a word from the main memory, an entire block of 64 bytes, for example, containing that word and other contiguous words, is simultaneously transferred from the main memory into the cache memory. Subsequently, whenever the central processing unit desires to access a word adjacent to the previously accessed word, the new word may be accessed from the cache memory. Therefore, one slow access of many bytes assures many fast subsequent accesses of those bytes.
Specifically, the system of the invention provides for the simultaneous parallel transfer of words from the main memory to a cache memory. This transfer exceeds the central processing unit's immediate requirements in anticipation of the probability of future requirements of the central processing unit. This technique provides an instant improvement in the apparent access time to the main memory. Typically, only one out of thirty-sixty accesses are made from the main memory. However, even if the average were only one out of ten, the improvement over the previous systems would be significant. An advantage of the system of the invention is that no special software modifications are required, since any software will work due to the principle of "Locality of Reference".
In order to implement the simultaneous parallel transfer of a block of words from the main memory to the cache memory efficiently, special control circuitry is required, which will be described in detail. This control circuitry will be referred to as the "cache address storage and comparator" (CASC) circuitry. That is, every time the central processing unit desires to access memory, this control circuitry must determine whether the access can be made from the cache memory, or whether it must be made from the main memory. Every time the central processing unit desires to read or write from memory, it places an address on the address bus corresponding to the word location in the main memory. However, the word location address in the cache memory is totally different, and the control circuitry must have access to the block addresses of all blocks stored in the cache memory, and it must quickly determine if the word in the main memory address requested by the central processing unit is in the cache memory.
If the word requested by the central processing unit is indeed in the cache memory, then control circuitry quickly convert the main memory address requested by the central processing unit to the corresponding cache memory address. For example, the main memory may contain byte addresses from 0 to 7FFFF (hexadecimal notation); and the cache memory addresses may only extend from 0 to 0FFF. Thus, a conversion of the 19-bit central processing unit main memory address request to a 12-bit cache memory address must occur in the control circuitry automatically and rapidly. Typically, the low order bits (4 to 10 bits) would be identical insofar as addressing both memories is concerned, so that only the higher-order bits need be converted.
If the word requested by the central processing unit is not contained in the cache memory, then the following series of actions by the control circuitry occur: The central processing unit is signalled that the access will be to main memory, this being achieved by transmitting a "wait" command to the central processing unit; the block location in the cache memory is selected in which to place the new block from main memory, corresponding to the accessed word and a predetermined number of adjacent words; the block containing the desired word and its adjacent words is transferred by simultaneous transfer from the main memory to the cache memory; the main memory address of the block now in cache memory is stored in the control circuitry; the central processing unit is given access to the desired word which is now in the cache memory; and the central processing unit is signalled that the word is now available, by releasing the "wait" command.
When a new block is to be transferred into the cache memory, but the cache memory is already full, one of the blocks in the cache memory must be deleted by being overwritten with the new block from the main memory. The system of the invention also contains special circuitry to accomplish a reasonably efficient algorithm to select the block in the cache memory which is to eliminated. This special control circuitry is premised on the concept that a block status message is stored in the special circuitry corresponding to each block in the cache memory, and the status of the various messages stored in the special circuitry is adjusted periodically according to different algorithms which may be selected.
Then, when a decision to eliminate a particular block from the cache memory must be made, the special circuitry performs a high-speed scan of all the status messages in the special circuitry using simultaneous comparisons, and selects one which best meets the selected criteria. The system then transfers the block from the main memory to the particular block address within the cache memory corresponding to the block to be eliminated.
Thus, the special circuitry serves to store the status of the various blocks in the cache memory, periodically to up-date the status, to perform the high-speed scan of all the stored status, and then to convert the status to a particular address in the cache memory, corresponding to the block in that memory which is to be eliminated. This special circuitry will be referred to hereafter as the "cache address status storage and converter" (CASSAC) circuitry.