1. Field of the Invention
Embodiments of the present invention relate to techniques for improving the reliability of computer systems. More specifically, embodiments of the present invention relate to a method and an apparatus that improves the reliability of a computer system by selectively mitigating multiple vibration sources within the computer system.
2. Related Art
Computer systems such as servers and storage arrays can be adversely affected by mechanical vibrations of system internal components and structures. These vibrational problems are becoming more significant because of the following trends: (1) cooling fans are becoming increasingly more powerful; (2) chassis and support structures are becoming weaker because of design modifications that reduce cost and weight; and (3) internal disk drives, power supplies, and other system components are becoming more sensitive to vibration-induced degradation. For example, hard disk drives (HDDs) are becoming more sensitive to vibration because the storage density for HDDs has increased to the point where a write head has to align with a track which is less than 20 nanometers wide. Moreover, the write head floats only 7 nanometers above the disk surface. These extremely small dimensions make the read and write performance of the HDDs very sensitive to vibrations. Even low levels of sustained vibrations can significantly deteriorate I/O performance of the HDDs.
Cooling fans are a significant source of mechanical vibrations in a computer system. More specifically, servers and storage arrays are typically equipped with a large number of cooling fans (e.g., twelve or more) that operate at very high speeds. Because fan speeds in most computer systems are not actively controlled, the speed of each fan typically varies from the designed speed. One problem associated with fan speed variation is that it can excite vibrational resonances within a computer system's mechanical structure. Specifically, if a fan's operating speed or an associated harmonic coincides with an internal vibrational resonance of the computer system, there can be a significant resonance-related amplification of the vibration which can cause HDDs to fail or perform poorly. Note that an even more direct cause of an HDD failure is when the frequency of a fan-induced mechanical vibration coincides with the internal vibrational resonance of the HDD, which is typically associated with the rotation speed of the HDD during operation. Hence, it is highly desirable to mitigate cool-fan-induced mechanical vibrations in order to improve the health of HDDs.
A “brute force” approach to mitigating vibration-induced HDD failures involves: (1) identifying HDDs which are adversely affected by the vibrations; (2) identifying components that are generating the vibrations (e.g., cooling fans or other HDDs); and (3) taking action to mitigate vibrations for the components identified in steps (1) and (2).
Unfortunately, the above-described approach can be extremely costly and inefficient in practice. Note that the mitigation action often involves inserting rubber or foam dampers, grommets, or stiffeners in available spaces around identified vibration sources and vibration-sensitive HDDs in an effort to isolate these components from the rest of the computer system. For a simple computer system that comprises only a few vibration-sensitive HDDs and only a few significant vibration sources, this is a relatively easy task. For example, for a single-cool-fan in a single-HDD system, one can place isolation materials around the fan and/or the HDD until the desired level of HDD performance is achieved. However, for more complex computer systems containing multiple HDDs and multiple fans (e.g., servers and storage arrays) the mitigation process can become substantially more complex. For example, in a large server that contains 14 internal fans and 48 internal HDDs, it becomes extremely difficult to determine which vibration sources (i.e., fans and HDDs) cause problems for specific HDDs. In the example server system described above, one would have to experimentally assess a total of 2,928 unique pair-wise combinations of target components, i.e., 48 HDDs paired with vibrating components (14 fans plus 47 other HDDs).
One option is to mitigate all the fans in the server in an attempt to isolate each component that vibrates (which includes all fans and all HDDs) from each component that is vibration sensitive (i.e., the HDDs). However, this option entails an enormous amount of work and can result in significant additional cost and complexity.
Hence, what is needed is a method and an apparatus that facilitates mitigating multiple vibration problems without the above-described issues.