1. Field of the Invention
The present invention relates to memory systems used in digital computer systems and more particularly to memory systems which include high-speed data caches.
2. Description of the Prior Art: FIG. 1
As CPUs have gotten faster and faster, computer system performance has often been limited by the amount of time required to perform the memory operations of fetching data from and writing data to memory. In order to speed up memory operations, the prior art has employed hierarchical memories. At the top of the hierarchy is a small amount of fast, expensive memory; at the bottom is a large amount of slow, cheap memory. For example, a virtual memory computer system may have three levels of memory: a high-speed cache which contains copies of data currently being referenced by the CPU, a main memory which contains copies of the data in the cache and additionally contains copies of data at memory locations near those containing the data currently being referenced, and one or more disk drives containing all of the data presently available to the CPU. As a program references data, the computer system typically copies pages containing the referenced data from the disk drive to the main memory and individual data items from the main memory to the cache. Once most of the data required to execute a program is in the cache, it is the time required to fetch data from the cache, rather than the time required to fetch data from the disk or main memory, which determines the speed with which the CPU can process data. Of course, if a memory operation alters data in the cache, then the computer system must ensure that the data from which the cache contents was copied in main memory and on disk is correspondingly altered. Similarly, if a memory operation performed by some other portion of the system, for example, an I/O device, alters the contents of a part of memory which has been encached, then the copies in the cache must also be altered.
As may be seen from the above overview of hierarchical memory systems, the primary problem in cache design is maintaining consistency between the contents of the cache and the data at other levels of the hierarchy. FIG. 1 shows the manner in which the prior art has solved the problem of consistency. FIG. 1 is a block diagram of a digital computer system including a cache of the type described in U.S. Pat. No. 4,445,177, Bratt, et al., Digital Data Processing System . . ., issued Apr. 24, 1984. The digital data processing system of FIG. 1 includes CPU 101, cache 103, and main memory 117. As may be seen from the connections between CPU 101, cache 103, and memory 117 in FIG. 1, all transfer of data between CPU 101 and memory 117 takes place via cache 103. If CPU 101 reads data and a copy of data is not present in cache 103, cache 103 first obtains the data from memory 117 and then provides it to CPU 101. Similarly, if CPU 101 writes data, it writes the data to cache 103, which then updates the data in memory 117.
Turning now to cache 103, cache 103 is made up of two main components: store 107 and control 105. Store 107 contains the copies stored in the cache and information required for cache operation. The contents of store 103 are arranged as a series of registers 108. At a given moment, each register 108 corresponds to one address in memory 117 and may contain a copy of the data at that address in memory 117. At different times, a register 108 may correspond to different addresses in memory 117. Each register 108 contains a validity bit v 109, indicating whether the copy of data it contains is valid, a tag 111, which serves to relate the register 108 to the memory address to which it currently corresponds, data 113, which, when valid, contains a copy of the data at the corresponding address in memory 117, and a dirty bit d 115, which indicates whether data 113 has been altered since it was written back to memory 117.
Control 105 controls operation of the cache in response to the contents of v 109, tag 111, and d 115, addresses and control signals from the CPU and I/O devices and further produces control signals of its own which synchronize the operation of CPU 101, cache 103, and memory 117. When CPU 101 performs a memory operation, the address of the data being operated on and a control signal indicating the kind of operation go to control 105. Control 105 uses a portion of the address to select a register 108; if tag 111 in that register 108 has the same value as the remainder of the address, the register 108 corresponds to the location in memory 117 specified by the address.
What happens next depends on the kind of operation indicated by the control signal. If the operation is a read operation and v 109 in register 108 indicates that the register contains a valid copy of the data at the location in memory 117 indicated by the address, the contents of data 113 is output to CPU 101; if the operation is a write operation, the data is written from CPU 101 to data 113 and dirty bit 115 is set to indicate that data 113 has changed. In this situation, control 105 must write the value of data 113 back to the location in memory 117 specified by the address and reset dirty bit 115.
If tag 111 in register 108 addressed by the address from the CPU does not have the same value as the remainder of the bits in the address, or if v bit 109 indicates that data 113 in register 108 is invalid, the cache does not contain a copy of the data at the location specified by the address and a cache miss results. If the miss is on a write operation, cache 103 constructs an entry for the address by placing the data being written in data 113 of register 108 specified by the address, placing the remainder of the address in tag 111 of that register 108, setting v 109 to indicate validity, and d 115 to indicate that the value of data 113 has changed. Control 105 then writes the contents of data 113 to memory 117 as described above.
If the miss is on a read operation, control 105 responds to the miss by generating a control signal to CPU 101 which causes CPU 101 to wait until there is a valid cache entry. Then it provides the address and a control signal indicating a read operation to memory 117, which responds with the data at the location specified by the address. Control 105 then locates the proper register 108 for the data, loads the data into data 113, the remainder of the address into tag 111, sets v 109 to indicate a valid entry and resets d bit 115. Thereupon, CPU 101 reattempts the memory reference. Since the data is now contained in the cache, it succeeds and the data is output as described above.
Another consistency problem arises when the computer system which includes cache 103 allows I/O devices to bypass CPU 101 and write data directly to memory 117. In this case, when an I/O write operation alters data in memory 117 of which there is a copy encached in cache 103, some change must be made in cache 103. In cache 103 of FIG. 1, control 105 receives the address each time there is an I/O write operation, and if there is a hit as described above, control 105 sets v 109 in the register 108 specified by the address to indicate that the contents of data 113 are invalid. As described above, on the next reference to the memory location which received the data from I/O, a miss will result and the proper value of the data will be written to cache 103.
In the prior art, data caches have been characteristic of relatively large and expensive digital computer systems. One reason for this has been the high cost of high-speed memory; another has been the high cost of the complex control logic required for such a cache. Technical progress has reduced the cost of high-speed memory, but there has been no corresponding reduction in the cost of the components of the control logic. Simplification of control logic has thus become a major problem in cache design. One solution to this problem is provided by the present invention.