Network telemetry involves the use of automated tools and processes designed to collect measurements and other data at points throughout the network, which can then be used for network monitoring and performance analysis.
Modern packet networks are designed with a scaled out and centrally managed architecture to support elastic loads, large traffic volumes and high availability of services. For modern packet networks, traditional telemetry mechanisms, such as polling or streaming mechanisms, are proving to be inadequate at least in part due to limitations in notifying rapidly changing network state, a lack of extensibility for future network/system growth, and a lack of customization of data collection.
For example, streaming telemetry mechanisms, such as OpenConfig, try to streamline the notification of network state by having the network elements stream the telemetry data up to a central management entity where the data gets stored and processed. While streaming telemetry mechanisms employ extensive offline algorithms to process telemetry data, they are not designed to inherently improve the quality of the data collected.
To improve extensibility and bring flexibility into telemetry data collection, the In-band Network Telemetry (INT) framework was developed for packet networks. INT is implemented in the data plane such that telemetry information is carried in data packets (e.g., in the header of data packets) and can get modified with each hop. The data plane refers to the part of a device's architecture that makes forwarding decisions for incoming packets. For example, routing may be determined by the device using a locally stored table in which the device looks up the destination address of the incoming packet and retrieves the information needed for forwarding.
The INT framework relies on programmable data planes to bring flexibility to telemetry data collection. Devices with programmable data planes include network processors or general-purpose central processing units (CPUs) at the low end, and data path programmable switch chips at the high end. With INT, a source switch (or more generally, a source network device) incorporates an instruction header to collect network state information as a part of the data packet. Intermediate INT-capable switches (devices) interpret the instruction header and collect and insert the desired network state information in the data packet, which eventually reaches a sink switch and can be used as needed to monitor and evaluate the operation of the network. Advantages of INT include real-time telemetry rates, low CPU and operating system (O/S) overhead, and the flexibility to programmatically instrument packets to carry useful telemetry data. The programmable data planes used in INT have been explicitly designed for packet networks; however extending INT mechanisms into optical networks, where there is no notion of data packets, is far from straightforward due to factors such as layering and the presence of purely analog devices.
The emergence of integrated packet and optical networks, or “packet-optical networks”, such as those interconnecting data centers, see additional challenges when it comes to network telemetry because of the different types of telemetry data collected in packet versus optical networks. For example, the telemetry data collected in a packet layer of a packet network, such as packet loss and latency, on a per-flow basis cannot be easily attributed to or correlated with data collected in the optical layer of an optical networks, such as bit error rates (BERs) and quality factor (Q-factor). Even in the scenario that the telemetry data collected in the packet layer does not indicate any errors in the optical layer, it is not possible to monitor any layer in the optical network for a specific duration of time using existing telemetry solutions because the optical network lacks the digital constructs used by existing telemetry solutions, and the packet layer does not have access to measurements in the optical network. Optical parameters may affect traffic flows. For example, if a link experiences degradation in Q-without link failure, operators can use the knowledge of that information to proactively move critical applications away from that particular link. Thus, it would be useful for network operators to be able to monitor optical parameters over time to use in routing and other applications.
Another challenge is that there is no standard mechanism to transfer telemetry data between layers other than at failure instances. A further challenge occurs in associating packet flow telemetry data with the corresponding data from optical transport network (OTN) layers, which involves piecing together telemetry data from many devices. Thus, existing telemetry solutions, including INT, do not address these challenges in packet-optical networks and are thus unable to achieve end-to-end correlation of collected network state data in packet-optical networks.