Distributed computer systems include multiple distributed machines, which often generate a substantial number of traces. Moreover, each trace usually includes a substantial number of trace records. Complex correlations often exist among the events that are indicated by the trace records. Due to clock skew between the machines, it is often difficult to determine the order of the events, which makes analysis of the traces a challenging endeavor.
If the various traces were merely merged based on the local timestamps of the respective machines, some of the causalities associated with the events may be lost in the merged trace due to the clock skew between the machines. For example, if machine A sends a message to machine B, the message is likely to be received at machine B after several milliseconds. However, the trace record for the send event that is generated at machine A may have a timestamp that is greater than the timestamp of the trace record for the receive event that is generated at machine B. When this happens, the merged trace likely will not accurately reflect the temporal order of the events. Accordingly, trace analysis performed on the merged trace may be relatively challenging. For example, it may not be possible to generate an accurate snapshot of the states of the distributed computer system (i.e., a distributed snapshot) or to perform an accurate distributed invariance check of the distributed computer system.
One proposed solution for ordering trace records in a distributed computer system is referred to as the “Lamport Clock” algorithm. The algorithm requires that whenever a message is sent, the sender's local timestamp is attached to the message. Accordingly, the sender's local timestamp serves as the timestamp of the send event. The receiver of the message assigns a timestamp to the receive event that is greater than the timestamp of the send event, which often involves making the timestamp of the receive event greater than the receiver's local timestamp. The Lamport Clock algorithm is commonly used to generate a distributed snapshot. Theoretically, the Lamport Clock should enable the traces from the various machines in the distributed computer system to be merged to obtain a consistent order that captures all the causalities associated with the events that are indicated by the trace records. However, in practice, achieving the ordering and capturing the causalities often is not possible for a variety of reasons.
For example, the Lamport Clock algorithm requires each machine that produces a trace to implement the algorithm, which is not trivial. In another example, some low level protocol messages cannot have a Lamport Clock timestamp. For instance, if an attempt is made to connect to a port of a machine and the port is not being monitored, an Internet Control Message Protocol (ICMP) message may be generated to indicate that the connection has been rejected. However, because the ICMP message is part of the TCP/IP stack, it is not possible to add a Lamport Clock timestamp to the ICMP message. In yet another example, correlation between machines sometimes is achieved by using correlated timers, rather than by passing messages. For instance, if machine A is aware that machine B starts a timer earlier than machine A starts its timer with the same duration, machine A may infer that its timer will expire after the timer of machine B. This inference enables partially synchronous systems (e.g., those on timers) to implement certain behaviors that are theoretically impossible in a completely asynchronous system.