The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.
As data units are routed through different nodes in a network, the nodes may, on occasion, discard, fail to send, or fail to receive data units, thus resulting in the data units failing to reach their intended destination. The act of discarding a data unit, or failing to deliver a data unit, is typically referred to as “dropping” the data unit. Instances of dropping a data unit, referred to herein as “drops” or “packet loss,” may occur for a variety of reasons, such as resource limitations, errors, or deliberate policies. In many cases, the selection of which data units to drop is sub-optimal, leading to inefficient and slower network communications.
Moreover, many devices in networks with complex topologies, such as switches in modern data centers, provide limited visibility into drops and other issues that can occur inside the devices. Such devices can often drop messages, such as packets, cells, or other data units, without providing sufficient information to determine why the messages were dropped.
For instance, it is common for certain types of nodes, such as switches, to be susceptible to “silent packet drops,” where data units are dropped without being reported by the switch at all. Another common problem is known as a “silent black hole,” where a node is unable to forward a data unit due to a lack of valid routing instructions at the node, such as errors or corruption in forwarding table entries. Another common problem is message drops or routing errors due to bugs in particular protocols.
Beyond dropping data units, a variety of other low visibility issues may arise in a node, such as inflated latency. Inflated latency refers to instances where the delay in transmission of a data unit exceeds some user expectation of target threshold.