1. Field of the Invention
The present invention relates to techniques for enhancing the reliability of computer systems. More specifically, the present invention relates to a method and an apparatus for optimizing synchronization between monitored signals within a computer system.
2. Related Art
As electronic commerce grows increasingly more prevalent, businesses are increasingly relying on enterprise computing systems to process ever-larger volumes of electronic transactions. A failure in one of these enterprise computing systems can be disastrous, potentially resulting in millions of dollars of lost business. More importantly, a failure can seriously undermine consumer confidence in a business, making customers less likely to purchase goods and services from the business. Hence, it is critically important to ensure high availability in such enterprise computing systems.
To achieve high availability in enterprise computing systems, it is necessary to be able to capture unambiguous diagnostic information that can quickly pinpoint the source of defects in hardware or software. If systems have too little event monitoring, when problems crop up at a customer site, service engineers may be unable to quickly identify the source of the problem. This can lead to increased down time, which can adversely impact customer satisfaction and loyalty.
Fortunately, high-end computer servers, such as those manufactured by SUN Microsystems, Inc. of Santa Clara, Calif., are now equipped with over 1000 sensors that measure variables such as temperature, voltage, current, vibration, and acoustics. Furthermore, software-based monitoring mechanisms monitor system performance parameters such as processor load, memory and cache usage, system throughput, queue lengths, I/O traffic, quality of service, security, etc. In addition, many high-end computer servers have embedded diagnostic systems and online statistical process control techniques that collect and analyze process variables in real time. For example, SUN Microsystems, Inc. is developing a variety of tools for monitoring high-end servers.
These monitoring tools provide proactive fault monitoring based on telemetry signals. However, in many high-end servers, the monitored signals are non-synchronous. Processes can speed up and slow down depending on many factors. Over time, signals generated by different processes can drift even further out of synchronization, which can greatly complicate the process of correlating the signals. This is problematic because most types of statistical pattern recognition mechanisms require input data streams to be synchronized.
Hence, what is needed is a method and an apparatus for optimizing synchronization between the telemetry signals from a computer system.