Conventionally, testing of a computing or networking system that includes a fault tolerance strategy for availability includes injection of a failure into the system. For example, to ensure the system responds appropriately to a process failure, a perfectly healthy process is halted, the system's response is observed, and the results are recorded for further use. However, conventional approaches have not been extended to test all system components at all times. Instead, conventional approaches focus on testing specific system components at specific times. As such, system errors may go undetected for long periods of time, having a negative impact on overall system availability. These consequences are exacerbated in systems in production, where system reliability and availability cannot be compromised.
To facilitate understanding, identical reference numerals have been used, where possible, to designate identical elements that are common to the figures. It is contemplated that elements disclosed in one embodiment may be beneficially utilized on other embodiments without specific recitation.