The present invention relates to methods and apparatus for handling processor errors in a multi-processing system and, in particular, for re-allocating processor tasks among sub-processing units of the multi-processing system when a processor error occurs.
Real-time, multimedia applications are becoming increasingly important. These applications require extremely fast processing speeds, such as many thousands of megabits of data per second. While single processing units are capable of fast processing speeds, they cannot generally match the processing speeds of multi-processor architectures. Indeed, in multi-processor systems, a plurality of sub-processors can operate in parallel (or at least in concert) to achieve desired processing results.
The types of computers and computing devices that may employ multi-processing techniques are extensive. In addition to personal computers (PCs) and servers, these computing devices include cellular telephones, mobile computers, personal digital assistants (PDAs), set top boxes, digital televisions and many others.
A design concern in a multi-processing system is how to manage when one sub-processing unit exhibits a processing error. Indeed, a processing error could affect the overall performance of the multi-processing system and adversely impact the real-time, multimedia, experience by a user. This is particularly true when the result of one sub-processor is to be used by other sub-processing units in order to achieve a desired result.
Hard processor errors, such as error correction code (ECC) errors, parity errors, processor hang-ups, etc., may be characterized as fatal errors or recoverable errors. Fatal errors may occur due to operating system errors, kernel errors, etc., while recoverable errors generally do not involve operating system errors or kernel errors. When a recoverable error occurs, it would be desirable to be able to continue executing the processor tasks without violating any real-time processing deadlines or processing requirements, such as would be the case when recovery would require having to re-boot a given sub-processor and re-execute the processor tasks from the beginning. Until the present invention, this has not been possible.
Therefore, there is a need in the art for new methods and apparatus for achieving efficient multi-processing that reduces the adverse affects of hard processor errors.