1. Technical Field
The present invention relates generally to an improved data processing system, and in particular to a method and apparatus for processing errors in a data processing system. Still more particularly, the present invention provides a method, apparatus, and computer implemented instructions for parity error recovery.
2. Description of Related Art
In presently available computer systems, error detection logic and parity are used to ensure customer data integrity. Error detection involves testing for accurate storage, retrieval and transmission of data internally within the computer system. Parity checking is an error detection technique that tests the integrity of digital data within the computer system or over a network. Parity checking uses an extra ninth bit that holds a 0 or 1 depending on the data content of the byte. Each time a byte of data is retrieved, transferred or transmitted, the parity bit is tested. Even parity systems make the parity bit 1 when there is an even number of 1 bits in the byte. Odd parity systems make it 1 when there is an odd number of 1 bits. A parity error is an error condition that occurs when the parity bit of a character is found to be incorrect. For example, if the number of set bits is even, it sets the parity bit to 0; if the number of set bits is odd, it sets the parity bit to 1. In this way, every byte has an even number of set bits. When the data is checked, each byte is checked to make sure that it has an even number of set bits. If an odd number of set bits are present, an error has occurred. This check is typically made each time data is read from the storage device.
Today, computer systems use a large quantity of semiconductor memory for temporary data storage within the system. The types of temporary data storage includes a 1st level instruction (L1 I-Cache) and data cache (L1 D-Cache), a second level cache (L2), a third level cache (L3), a effective to real address translation (ERAT) buffer, translation lookaside buffer (TLB), and main memory. The smaller size memory within the system, such as L1, ERAT, or TLB are generally referred to as an array.
The large quantity of array or memory used in today's computer system also brings higher failure rates to the overall system. Semiconductor array or memory failures include solid and soft errors. Solid errors are those errors caused by imperfect manufacturing process or reliability wear out. Soft errors are those errors caused by alpha particle, cosmic ray or electrical noise within the computer system. Soft errors are transient errors. In general, soft errors in semiconductor array or memory are magnitudes higher than solid errors.
In some cases, computer hardware with a high failure rate, such as L2, L3, or system memory, uses error correction logic to minimize the impact of a failure of the system and improve overall system availability. Error correction logic, however, adds to the cost and additional circuit delay. As a result, the cost of systems and overall system performance may be reduced.
Therefore, it would be advantageous to have an improved method, apparatus, and computer implemented instructions for recovering from soft errors in the computer array with parity errors checking in a data processing system.