The field of the present invention relates to failure recovery in embedded memory arrays in general, and, more specifically, to hard memory array failure recovery utilizing a locking structure.
Processing systems typically comprise memory circuits configured to cache program instructions, program data, and other state information related to executing the program instructions. For example, a central processing unit (CPU) may include an instruction cache for caching the program instructions, a data cache for caching program data, and one or more address translation caches for caching previously computed virtual to physical address mappings associated with the program instructions and program data. Each cache includes an embedded memory fabricated to include a plurality of memory cells. Certain manufacturing flaws may cause one or more individual memory cells from the plurality of memory cells to exhibit failure behavior. The failure behavior may range from a hard failure at the time of manufacture to infrequent soft failures during normal operation. The failure behavior may become worse over time as a result of physical degradation of the one or more individual memory cells, eventually resulting in a hard failure.
Hard failures due to manufacturing flaws are typically detected and affected devices are conventionally discarded. Each discarded device effectively increases the cost of production for each passing device. Failure behavior that develops after a device passes manufacturing tests and is deployed in an end user processing system can produce errors and poor data integrity. Each processing system that includes a failing device may require repair or replacement, which effectively increases the cost of operating the processing system. In each case, failure behavior originating from the one or more individual memory cells is detrimental and costly.
One solution for managing soft errors in memory cells involves generating and checking parity for data stored in arrays of memory cells. If a parity error is detected, then data has been corrupted within the memory cells and the processing system may perform appropriate measures to avoid propagating the corrupted data. Another solution for managing soft errors in memory cells involves generating error correction codes (ECC) when data is written to the array of memory cells and performing error correction when reading the array of memory cells. While ECC techniques represent an adequate solution for managing most soft errors, such as those due to alpha-particle strikes, hard failures still leave the processing system vulnerable to unrecoverable faults and the potential for compromised data integrity. As processing systems increase in size and complexity, and include more processing cores with additional corresponding cache memories, overall system reliability will be detrimentally reduced due to hard failures in the cache memories. Reduced reliability will also lead to additional operating costs.
As the foregoing illustrates, what is needed in the art is a technique for managing hard failures in memory cells comprising cache memories.