Network processors (NPs) are employed in many of today's communications products, as opposed to traditional application specific integrated circuits (ASICs) or field programmable gate array (FPGA) fixed hardware, primarily due to fact that the architecture of these processors provides the flexibility of a software based feature set solution with the high performance of ASICs. Network processors utilize parallel processing or serial pipelines and are programmable like general purpose microprocessors, but are optimized for packet processing operations required by data packet network communication devices.
Network processors execute what is commonly referred to as microcode to perform data path packet processing functions. A network processor typically has a set of software threads (also referred to as tasks) which are spawned to perform packet processing operations by executing specific pieces of microcode.
Memory content corruption, for example a soft-error causing a memory bit to invert or “flip”, in a memory device used by the network processor may cause execution of one or more threads to lockup if the error corrupts a microcode instruction or a data structure used by the network processor. Additionally, a software bug or component defect in the network processor could interfere with normal processing, which could lead to thread execution lockups.
The result of thread execution lockup is that the locked up thread will no longer continue to process data path traffic, which can lead to a communication service outage or silent failure of the network communications device.
Soft-errors (single bit flips) can be mitigated effectively with hardware based error correction coding (ECC) protection. However in many cases it is not practical or even feasible to have 100% ECC coverage across all memories of a given network processor. Furthermore, ECC does not protect against multi-bit corruption or microcode software defects that can also lead to memory corruption and subsequent network processor thread execution lockup.
Hardware based ECC is not always feasible for various reasons, such as one or a combination of the following: added expense, insufficient space on the network processor to accommodate the extra hardware logic required for ECC codes, and performance degradation associated with the ECC hardware.
Good hardware design and component quality can reduce but can not completely eliminate the possibility of memory corruption due to soft-errors. Similarly, good software development practices can reduce but can not completely eliminate the possibility of software bugs that escape development testing.
Therefore, a way of mitigating the undesirable effects of network processor thread execution lockups that does not require ECC hardware is desired.