Periodically, applications and services in a distributed system terminate abnormally. Often, an abnormal termination is caused by a state change that should not have occurred. In a distributed computing environment, it can be difficult and time consuming to determine a cause of the abnormal termination. This may be especially true for distributed computing environments that include non-deterministic components (e.g., applications or services).
Debugging in a non-deterministic system is more difficult than debugging in a deterministic system. In a deterministic system, given some set of input messages, the same state will always be achieved, and the same output will always be produced. Therefore, a fault may be recreated by applying the same set of inputs (e.g., messages) that originally caused the fault. In a non-deterministic system, on the other hand, a single set of input messages may cause different states, and induce different output messages to be generated. Therefore, it may require upwards of 100 or more executions to recreate a fault.