Computer software applications are increasingly designed to run in clusters, i.e., to run as multiple replicated instances on possibly multiple computer systems, which, for illustration purposes, may be referred to as “hosts.” Advantages of clustered applications include scalability, robustness, and economy. Scalability refers to the fact that the application can easily expand its computing capacity, and robustness refers to difficulty for the entire application to fail. For example, if one instance of the application crashes, or even if an entire host containing several instances crashes, the other instances and hosts can continue to function. Clustered applications are economical because they are frequently deployed on many inexpensive computers, yet can provide as much computing power as one large, much more expensive computer.
However, the presence of many application instances on many hosts makes monitoring and management of these applications significantly more difficult. Monitoring the health of the applications in a live production environment and managing their behavior can become expensive and inefficient. Further, current monitoring and management solutions generally do not provide in-depth profiling information about applications. Even when they do, detailed profiling is computationally expensive, and can degrade application and/or system performance. Therefore, in-depth profiling information is usually not collected in deployment environments. To keep performance acceptable, deployment-time monitoring is used to provide coarse, general information about application behavior. However, information provided to the user is usually of limited use because such information normally does not provide enough detail to indicate the root cause of an application problem.