A computer system is typically made up of one or more central processing unit modules which are linked to a main memory module via a system bus which carries data, addresses, and control signals to and from the modules. For purposes of this document, the term "data" will refer to all information which is stored in the various memory devices of the computer, and retrieved and processed by the CPU(s), even though such information may be comprised of addresses. Stated differently,"data" will include the binary values stored in the memory devices and processed by the CPU(s), even though those stored and processed values may in fact represent an address that the CPU(s) wishes to access. On the other hand, the term "address" will refer to the binary values which are driven onto a bus for the purpose of accessing or storing data in a memory device.
Ordinarily, a CPU will retrieve data from main memory, perform some operation on it, and eventually store the results back in main memory. System performance is significantly effected by the number of times a given CPU must access main memory in order to get the data it needs to process. Particularly in a multi-processor configuration, in which a number of processors may be vying for access to the system bus, any single processor may be stalled while it waits for its turn to use the system bus and access main memory. Such stalling has a tendency to degrade system performance.
To reduce the number of times a given processor must access main memory, many systems include a relatively small memory within the CPU module, which is referred to as a "cache" and which principally holds the data which the CPU is currently processing, or likely to process in the near future. Given that the availability of data in a cache will reduce the latency associated with reading data from main memory and also reduce a given CPU's use of the system bus, system performance can be significantly improved through the use of a cache. Known in the computer architecture art are two types of caches: a "write through" cache and a "write-back" cache.
In a write through cache system, when the CPU requests data which is then not currently in the cache, the data is taken from main memory and stored in the cache. When the cache is filled, main memory transfers more than just the single data item sought by the CPU by including in the transfer other data that was in the same locality in main memory as the originally requested data. In many instances, after the CPU processes the requested data item it will want to continue processing other data items that were in the same locality in main memory and that were transferred with the requested data. If this is so, the CPU will not have to go to main memory to process the next data item, and the system performance will be correspondingly improved, as discussed above.
One drawback of a write through cache, however, is that anytime the CPU modifies a piece of data it will automatically update main memory's copy of that data. This updating function is accomplished by having the CPU write the data to main memory by using the system bus, which consequently causes system latencies. Accordingly, a write through cache improves system performance somewhat, by reducing the number of accesses the processor must make to main memory to read data for processing; however, it still has the disadvantage of requiring modified data to be written to main memory, which has a tendency to degrade system performance.
A write-back cache, on the other hand, operates similar to the write through cache in that it, too, is filled when the CPU requests data which is not then in the cache. Also, it is filled with more than just the single data item requested by the CPU, which, again, results in the probability that the next piece of data sought by the CPU will be in the cache. Unlike the write through cache, the write-back cache does not automatically update main memory each time it modifies a piece of data. Accordingly, system bus traffic is reduced which improves processing performance. For this reason, a write-back cache would be more desirable than a write through cache. There is, however, a disadvantage to the use of a write-back cache in place of a write through cache, which relates to the recovery of corrupted data and which can be understood only after a brief explanation of the function of parity and ECC protection.
Although on a statistical basis it does not happen frequently, occasionally when information is transferred or stored in a computer system it is possible for the binary values of the information to be changed due to any one of a number of reasons, including electrical interference or "noise", component failures, and the like. For example, one node may transfer a certain binary value, such as 00000000, but due to bus noise the receiving node may receive the binary value 00000001. Alternatively, a given node may store the binary value 00000000 in a random access memory ("RAM"), but due to a failure of the components which make up the RAM, that stored value may convert to 00000001. Absent any other information, in both instances it would be impossible for the system to tell that an error had occurred or to reconstruct the original binary values.
In order to protect the integrity of information within the computer system, many designs include parity information, typically a single bit called a "parity bit", which is computed on the basis of the binary value of the data that the parity bit is intended to represent. The parity bit is usually transferred and stored along with its related data. Consequently, if a parity bit is checked and it does not match the data with which it is associated, that data will be treated as corrupt data. In more sophisticated computer systems, data may also be protected through the use of ECC, which involves the generation of an ECC code, also in the form of a binary value, for the data. Unlike parity bits, which simply indicate whether the data is correct, but cannot be used to reconstruct corrupted data, an ECC code can be used to restore corrupted data to its original value, provided the magnitude of the data corruption is not too great. Therefore, when a parity error is detected, ECC protected data may be reconstructed to its original, correct value.
In many existing computer systems which use ECC, the circuitry necessary to generate and check ECC codes, as well as correct erroneous data, is implemented in main memory. Therefore, any data that was ever stored in main memory is protected by ECC, and consequently might be recontructed and corrected if a parity check detected that it was erroneous.
Referring once again to the operation of caches, the disadvantage of a write-back cache relative to a write through cache can now be seen, if it is desirable to have data protected by ECC. Specifically, in a system which uses a write-back cache, data which has been modified by the CPU is not automatically written back to main memory, resulting in the cache having the only current version of that data. More importantly, in a system in which the ECC is implemented in main memory, the modified data will not be protected by ECC because it has never been stored in main memory.
This disadvantage of a write-back cache can be more fully illustrated with an example. Assume that an ECC protected main memory fills a write-back cache with current data, the CPU reads that data out of the cache and modifies it, and then writes that modified data back into the cache along with its parity bit. Further assume that while the modified data is stored in the RAM's of the cache a single bit of the data gets corrupted. Finally, assume that the CPU now reads the modified and corrupted data, but detects the data error through a check of the parity bit. Although the original data was protected by the ECC of main memory, the modified version of the data was never returned to main memory by the write-back cache for ECC encoding. Consequently, although the CPU can detect that the data is erroneous, the system cannot reconstruct that data by referring back to the ECC coding of the original data because the data was subsequently modified.
By contrast, a system using a write through cache would not encounter this difficulty because, as discussed above, each time the CPU modifies data that is stored in the write through cache, main memory is automatically updated. At the time of updating, main memory will generate an ECC code for the modified data. As a result, if, using the same example above, the CPU were to read the modified and corrupted data from the write through cache and detect an error by checking the parity bit, it would invoke a software routine which flushes the bad data from the cache and replaces it with the correct data from main memory. The CPU can then continue its processing function using the correct data.
In summary, a write through cache has the disadvantage of automatically updating main memory each time it modifies data in the cache, which causes a degradation in system performance due to the delays associated with using the system bus to access main memory. The write through cache, however, has the advantage of having all of the data stored in it ECC protected, precisely because it does write all modified data back to main memory. On the other hand, a write-back cache has the advantage of improving system performance, relative to a write through cache, because it does not automatically update memory each time cache data is modified, resulting in fewer system bus transactions. The write-back cache, however, has the disadvantage of not insuring ECC protection of all data in the cache for the reasons discussed above, even though the data that has been in main memory may be ECC protected.
It should be noted that new system designs may provide ECC code generation and checking circuitry as well as error correction circuitry on the CPU. This arrangement enables the new systems to take advantage of a write-back cache, and its corresponding contribution to performance improvement, while also taking advantage of the protection of data integrity through ECC. All data processed by the CPU will have an ECC code, even if it has never been stored in main memory. The present invention, however, is designed to enable existing and future systems, which do not have ECC circuitry built into the CPU(s), to be modified or constructed such that the system can use CPU(s) which do not have ECC circuitry with a write-back cache, and still have ECC protection. In order for a system to be eligible for this modification, however, it is necessary for the CPU(s) of the system to have the ability to detect single bit parity errors, and, in response, invoke an error handling procedure which is typically contained in a software routine.
Accordingly, it is an object of the present invention to provide a write-back cache which can be easily and inexpensively integrated into a computer system while also achieving ECC protection of all data in the cache. A feature of the invention is that ECC codes are generated and checked and errors are corrected by a memory interface, and ECC codes are stored in the cache. An advantage, therefore, is that many existing CPU's may be able to take advantage of this invention and its beneficial effect on system performance, provided that the CPU's are capable of detecting single bit parity errors and responding by invoking a error handling routine.
Moreover, system designs which have a write through cache may easily incorporate the present invention because they already have software which includes an error handling routine. When a parity error is detected, write through caches use such a routine to flush the bad data from the cache and replace it with good data from main memory. Therefore, another advantage of the present invention is that its implementation in a system which already has a write through cache would require minimal software modification because the software error routine required by the invention is basically the same as the one already used by write through caches, except that instead of flushing the bad data, the bad data is returned to main memory via the memory interface. This similarity results in greater ease and less expense in incorporating the present invention in existing designs.
Another object of the present invention is to make a write-back cache design as robust as a write through cache design so that it is capable of supporting ECC protection of all data in the cache. In so doing, the write-back cache would no longer be less desirable than a write through cache with respect to protection against data corruption. A feature of the present invention is to have all data in cache, including data modified by the CPU, ECC encoded by the memory interface. Therefore, another advantage of the present invention is that systems that previously did not use a write-back cache, due to its incompatibility with ECC, can now do so, and thereby benefit from the effect the write-back cache has on system performance.
A further object of the present invention is to not place the responsibility for ECC coding, checking, and error correcting on the system's CPU. This is to insure that the invention is compatible with many existing CPU's, and to insure that the CPU will not have to be involved in ECC functions, other than in certain instances the CPU will detect a parity error and respond by invoking an error handling routine. By having the ECC function performed by the memory interface, a system in which the CPU can already detect parity errors and invoke a correction procedure, will not require a substitute CPU, and its related expense, in order to take advantage of the invention. Moreover, the ECC function is kept out of the performance critical path between the CPU and cache.