1. Field
The present embodiments relate to techniques for analyzing telemetry data. More specifically, the present embodiments relate to a method and system for enhancing bandwidth associated with sampling telemetric signals from a monitored system.
2. Related Art
As electronic commerce becomes more prevalent, businesses are increasingly relying on enterprise computing systems to process ever-larger volumes of electronic transactions. A failure in one of these enterprise computing systems can be disastrous, potentially resulting in millions of dollars of lost business. More importantly, a failure can seriously undermine consumer confidence in a business, making customers less likely to purchase goods and services from the business. Hence, it is important to ensure high availability in such enterprise computing systems.
To achieve high availability, it is necessary to be able to capture unambiguous diagnostic information that can quickly locate faults in hardware or software. If systems perform too little event monitoring, when a problem crops up at a customer site, service engineers may be unable to quickly identify the source of the problem. This can lead to increased down time.
Fortunately, high-end computer servers are now equipped with a large number of sensors that measure physical performance parameters such as temperature, voltage, current, vibration, and acoustics. Software-based monitoring mechanisms also monitor software-related performance parameters, such as processor load, memory and cache usage, system throughput, queue lengths, I/O traffic, and quality of service. Typically, special software analyzes the collected telemetry data and issues alerts when there is an anomaly. In addition, it is important to archive historical telemetry data to allow long-term monitoring and to facilitate detection of slow system degradation.
Moreover, an increase in the number of components within computer servers has resulted in an increase in sensor density within the computer servers. For example, thousands of sensors may be used to monitor the various components of a large computer server. Dynamic monitoring techniques for computer servers may further require that each sensor be sampled at or above a certain rate. In turn, the use of additional sensors to collect telemetry data at high sampling rates has resulted in higher bandwidth demands associated with sampling the telemetry data.
However, system buses that collect and transmit the telemetry data typically have bandwidth limitations that prevent the telemetry data from being sampled beyond a certain rate. For example, telemetry data collected using an Inter-Integrated Circuit (I2C) system bus may be limited to 3.4 megabits per second. As a result, an increase in sensor density within a computer server may cause the sampling rate of one or more sensors to fall. For example, a computer server with an I2C system bus and thousands of sensors may be so bandwidth-limited that each sensor may only be sampled once a minute or longer. Dynamic monitoring and integrity analysis techniques that require frequent sampling of sensors may thus be impeded by such slow sampling rates.
Hence, what is needed is a technique for increasing the bandwidth associated with collecting telemetry data in monitored computer systems.