Advances in software programming and computing technology have made increasingly sophisticated and feature-rich software applications available to consumers and businesses. For businesses in particular, these powerful software applications provide benefits in terms of improved accuracy, efficiency, and convenience for numerous tasks performed on a regular basis. Today's enterprises are largely dependent on the software applications for most aspects of their businesses. Typically, large enterprises organize their computing resources into multiple data centers, each data center being a pool of computing resources and storage that may be physically separated from the other data centers. The software applications run in such data centers and end users' requests to such software applications flow into one or more data centers of the enterprise.
Most of the software applications include a large number of application components, arranged in multiple tiers and spread across multiple servers within the data centers. Smooth operation of such software applications is dependent upon successful detection and localization of performance faults that arise in the data centers during operation of such software applications. Thus, to keep such software applications continuously available, particularly those applications that are considered to be business-critical software applications, automatic and real-time detection of performance problems resulting from software or hardware faults in the data centers, and subsequent localization and correction of these faults is critical for the enterprises.
Several approaches for fault detection and localization in data centers have been proposed in the past few years. Such approaches involve introducing monitors/probes in the data centers. However, effectiveness of the fault detection and localization according to such approaches varies depending on the number of monitors, and the type of monitors introduced in the data centers.