Error handling is an important aspect in the implementation of a distributed system in which various interdependent components operate. When a fault occurs in one component running in the distributed system, the other remote components running in the distributed system that depend on the faulty component should not crash due to the faulty component. Rather, the other remote components should be able to handle the errors from the faulty component gracefully, for example, by releasing any impacted resources and properly reporting the errors.
However, testing error handling in a component of a distributed system is usually difficult because the other remote components may be either not open or not suitable to modification for testing. A solution for testing error handling in such a situation is using a fault injection technique that does not require modifying the remote components. One possible approach to fault injection is to introduce some form of a network proxy component that is capable of injecting faults into the network calls. Unfortunately, such a proxy can introduce unnecessary delays and complicate the way components are registered with each other. What is more important, a network proxy can simulate only very simplistic failure scenarios, such as dropped network packets and failed calls. More sophisticated failure scenarios, such as returning incorrect/unexpected results or failing tasks produced by asynchronous calls, are usually not possible to simulate through a network proxy.
Throughout the description, similar reference numbers may be used to identify similar elements.