Networks contain communicating nodes. Networks can be wired or wireless, and communications between nodes can be unreliable. Failures in one node, e.g., due to software bugs or sequences of input that were not foreseen when node software was developed, can cause nodes to issue incorrect messages to other nodes, or to fail to issue correct messages to other nodes. This can result in cascading failure, as incorrect messages from one node cause other nodes to behave incorrectly. This problem can occur in any network-embedded system, i.e., any system including numerous intercommunicating processing elements.
This problem is particularly noticeable in wireless sensor networks (WSNs). Nodes in these networks are generally small and include low-power processors and sensors for measuring a characteristic of the node's immediate environment. Examples include temperature sensors and hazardous-gas sensors (e.g., carbon monoxide). Other examples of nodes are nodes attached to structural components of bridges or buildings to measure stress or strain of the component around the point of attachment of the node.
In order to debug failures observed in networks, e.g., WSNs, a helpful technique is to determine the sequence of node interactions prior to the failure. To this end, nodes can store a running log (e.g., in a circular buffer) of messages transmitted (TX) or received (RX). This log information, referred to as a “trace,” can be collected from nodes after a failure occurs. Traces from numerous nodes can be compared and set in time order to determine the sequence of node interactions leading up to a failure.
Traces can store messages sent and received or other events. A trace can store the power used for each subsystem at various times, or external events detected by a sensor, or the flow of control through the software executing on a node. However, tracing may require large buffers, which small WSN nodes generally do not have room in memory to store. Various schemes attempt to use compression to fit more traces in a given buffer size. However, most conventional compression schemes use a large buffer to look for patterns across a large block of a dataset. WSN nodes do not generally have enough buffer space to use these techniques. Moreover, WSN nodes do not always have access to a global time reference, so combining traces from multiple nodes in the correct order can be challenging.
There is a need, therefore, for ways of providing traces that can be used to accurately reconstruct sequences of events, e.g., leading up to a failure.