Some large-scale software applications are distributed in order to make efficient use of the computing power available in multiple computer systems. Not only are complex software applications distributed across multiple systems, but also the computer systems may be distributed between multiple data centers. The distances that separate data centers may range from city-to-city, country-to-country, or even continent-to continent, depending on the application desires. Not only may an application be distributed between data centers, but the services provided by a software application may also be distributed between data centers.
Within a data center, various computing systems (“servers”) may be interconnected to form clusters that are dedicated to performing one or more tasks associated with one or more software applications. The number of servers in a cluster may be selected as a function of the anticipated computing desires of the software application and the computing capacity of the servers.
Deploying a distributed application in a large-scale computing environment is a complex task. Network management systems can measure and analyze system traffic. However, it is difficult to associate the measured traffic with individual applications in such large-scale environments. The process of selecting which services should be provided by which data centers, which servers should be clustered, and which tasks should be performed by which clusters includes significant analysis of the computing demands of the application and the capacities of both the hardware and software. Furthermore, the software itself may be configurable to be more responsive to users. For example, application tasks are assigned to application processes, and each process may have a certain capacity for processing tasks, such as thread count.
Metrics traditionally used in system management usually refer to utilization or individual machine parameters such as central processing unit (CPU) and storage, however, these parameters are difficult to correlate with distributed applications. Current system management approaches have weaknesses in deriving information from their information bases to provide higher-level perspectives for the behavior of large-scale distributed systems.