1. Field of the Invention
This invention relates to SMP computer systems having a cache design, and particularly to recovering the hardware after a failure.
2. Description of Background
As SMP computer systems continue to improve in performance, the cache designs are growing exponentially. These larger cache sizes are making it much more likely to have soft and hard array failures. Previously, a function called set delete was added to remove cache sections that have known defects. However, much of the prior art removes a lot of sets or compartments from the cache. The prior art used in the preferred embodiment allows for the deletion of one compartment within a congruence class without the full compartment delete.
Another aspect of the prior art of this invention allows for the purging of cache lines that have encountered an error. If the error is correctable, the data is re-corrected and enters the cache again as clean data, either in the original position or a different one. If the same set/compartment fails again (i.e. hard failure), a system log is made with all the failing data being logged out and that location is purged and deleted to avoid its use in the future. The preferred embodiment uses hardware to do this purge/delete. Logging is done through software code.
Even though these hardware features provide reliability benefits, the defective parts usually have to be replaced before a restart can be attempted. The reason is because the Array Built-In Self-Test (ABIST) checking logic will not usually pass when fuses have not been blown for a failing part of the array. The ABIST logic will make the failing address(es) available. Even when applying power-on repair, as described in U.S. Pat. No. 5,805,789, Huott, et al, there is a chance that there are no more fuses available for the repair and the part will need to be ordered before the customer can bring-up the machine again.