A typical high performance computer system comprises a Central Processing Unit (CPU), a main memory, a cache memory, and a Memory Management Unit (MMU) for managing the flow of data between the memories themselves and between the memories and the CPU. In such a system, a two-tiered addressing scheme is commonly implemented using virtual and physical addresses. Virtual addresses correspond to addresses generated and used by the CPU in the course of executing a program, and physical addresses correspond to addresses at which actual memory locations can be found in main memory. One of the functions of the MMU is to translate virtual addresses into physical addresses to allow the CPU to get access to physical memory locations.
Another function of the MMU is to maintain consistency between data in the main memory and the copy of that data in the cache memory. Towards this end, the MMU maintains a Cache Physical Address Table (CPAT) which maps each entry in the cache memory to a corresponding physical location in main memory. For each entry in the cache table, there is a corresponding entry in the CPAT which contains the physical address in main memory of where the data corresponding to that cache entry is found. This physical address serves as the necessary link between the memories for maintaining data consistency. To elaborate, one policy commonly employed for maintaining data consistency is the "copy back" policy. This policy requires that, when a data entry is modified within the cache, the modified entry at some point must be copied back into the main memory. To copy the data back to main memory, a proper physical address must be provided so that the modified data is copied into the correct location. The entry in CPAT corresponding to the modified cache entry provides this physical address. Thus, the CPAT plays a vital role in maintaining data coherence.
Such a computer system functions well when the data stored within the CPAT are accurate, but if some of the physical addresses in CPAT are altered due to transient errors occurring within the CPAT, system integrity may be compromised. For example, suppose that a modified entry in the cache needs to be copied back to main memory but that the corresponding CPAT entry has been corrupted such that the CPAT entry contains an erroneous physical address. If the error is not detected and corrected, then the modified data will be copied back into the wrong location, which causes two errors in the system. First, the memory location to which the modified data should have been written will now contain outdated data. Second, the location to which the modified data is written will now contain incorrect data. These errors can seriously compromise the integrity of the system. To optimize the reliability of the computer system, a means is needed to detect and to correct the transient errors that may occur within a memory coherence table.