1. Field
The present embodiments relate to techniques for monitoring and analyzing computer systems. More specifically, the present embodiments relate to a method and system for inferring the altitude of a computer system by analyzing telemetry data from the computer system.
2. Related Art
As described by Moore's Law, integrated circuit (IC) performance continues to increase at an exponential rate. However, these performance improvements are typically accompanied by corresponding increases in complexity and sensitivity to the environment. In particular, vibration-related problems are increasingly prominent in high-performance computer systems such as servers, mainframes, and supercomputers. These vibration-related problems may be caused by several factors. First, cooling fans have increased in power to compensate for the additional heat generated by the machines' hardware components. In addition, cheaper and lighter materials in chassis and support structures are less effective at dampening vibrations than heavier and more expensive materials. Finally, newer generations of hard disk drives (HDDs) are more sensitive to vibration-induced degradation.
More specifically, HDDs have increased in both storage density and performance in accordance with Moore's Law. At these increased densities, a write head of an HDD may be required to hit a track that is less than 20 nanometers in width, while the write head may be separated from a corresponding platter by a distance of several nanometers. Finally, the platter may spin at speeds of up to 15,000 revolutions per minute (rpm). These factors have caused the latest generation of HDDs to be more sensitive to vibrations. Consequently, vibration-related problems may cause the HDDs within a computer system to experience reductions in read and write throughput. Moreover, the increased internal latencies caused by the degraded throughput may cause software applications to hang, crash, and/or reboot.
Similarly, single-event upsets (SEUs) from cosmic radiation may affect processor and/or memory state. SEUs may also propagate to become soft errors in computer systems. Because sensitivity to SEUs increases with higher gate densities and lower voltages, soft error rates (SERs) may grow with successive generations of ICs.
Furthermore, altitude may affect the propagation of both vibrations and SEUs in computer systems. First, thinner air at higher altitudes may result in higher fan speeds that increase vibration-induced degradation, noise, and/or energy consumption. Higher fan speeds may additionally produce vibrations at the resonant frequency of a chassis, resulting in accelerated failure rates for machines and components at that altitude.
Second, cosmic ray flux may increase by a factor of about 2.2 for every 1,000 m increase in altitude. As a result, SERs for computers at high altitude may be an order of magnitude higher than SERs for computers at sea level. Consequently, the use of soft error rate discrimination (SERD) thresholds that do not account for altitude may result in a large number of false alarms and/or poor discrimination sensitivity.
Hence, altitude information may facilitate the diagnosis and remediation of degradation in computer systems.