The invention relates in general to the field of computer-implemented methods and systems for monitoring data communication networks, in particular for monitoring queues at switches or nodes of such networks.
The monitoring, control, management and optimization of large, networked computerized systems (such as datacenters, high-performance computing (HPC) systems, clouds, and transport [street/rail] networks) is a growing challenge, owing to the lack of (network) observability in such systems. Large networks are distributed and decentralized systems that comprise thousands to millions of physical and/or virtual queues, and carry large numbers of packet-based flows.
In the example of a cloud, HPC or datacenter network (DCN), these queues are interconnected in a topological graph, which typically is a k-ary n-tree such as fat-tree, Clos, dragonfly or a k-ary n-cube such as a mesh or hypercube.
Such DCN and HPC fabrics may convey Tera to Peta packets of traffic across millions of ephemeral (mice) and persistent (elephants) flows. Despite decades of research in communications and transport networks, the nature and characteristics of such traffic remain hard to observe and comprehend. Hence a reduced capacity to control, schedule and optimize such networks. In particular, the direct observation of multitude of queues is not (or hardly) possible inside a large network such as a DCN, particularly at the temporal granularity of a few packets (nanosecond to microsecond scale).
A scheme for building a space-time correlated global sampling system for a multitude of queues has been introduced A. S. Anghel, R. Birke, and M. Gusat (“Scalable High Resolution Traffic Heatmaps: Coherent Queue Visualization for Datacenters”), TMA 2014.