With the advent of Internet applications, computing system requirements and demands have increased dramatically. Many businesses, for example, have made important investments relating to Internet technology to support growing electronic businesses such as E-Commerce. Since companies are relying on an ever increasing amount of network commerce to support their businesses, computing systems generally have become more complex in order to substantially ensure that servers providing network services never fail. Consequently, system reliability is an important aspect to the modern business model.
A first approach for providing powerful and reliable services may be associated with a large multiprocessor system (e.g., mainframe) for managing a server, for example. Since more than one processor may be involved within a large system, services may continue even if one of the plurality of processors fail. Unfortunately, these large systems may be extraordinarily expensive and may be available to only the largest of corporations. A second approach for providing services may involve employing a plurality of lesser expensive systems (e.g., off the shelf PC) individually configured as an array to support the desired service. Although these systems may provide a more economical hardware solution, system management and administration of individual servers is generally more complex and time consuming.
Currently, management of a plurality of servers is a time intensive and problematic endeavor. For example, managing server content (e.g., software, configuration, data files, components, etc.) requires administrators to explicitly distribute (e.g., manually and/or through custom script files) new or updated content and/or configurations (e.g., web server configuration, network settings, etc.) across the servers. If a server's content becomes corrupted, an administrator often has no automatic means of correcting the problem. Furthermore, configuration, load-balance adjusting/load balance tool selection, and monitoring generally must be achieved via separate applications. Thus, management of the entity (e.g., plurality of computers acting collectively) as a whole generally requires individual configuration of loosely coupled servers whereby errors and time expended are increased.
Presently, there is not a straightforward and efficient system and/or process for providing system wide operational metric data of the collection of servers. Additionally, there is no system and/or process for providing system wide operational metric data of a collection of arrays of servers. Some applications may exist that provide operational metrics of an individual server, however, these applications generally do not provide operational metrics across the logical collection of loosely coupled servers. For example, many times it is important to view information from the collection of servers to determine relevant system-wide performance. Thus, getting a quick response view of pertinent operational metrics (e.g., performance, status, health, events) associated with the plurality of servers may be problematic, however, since each server generally must be searched independently. Downloading all operational metric information from each individual server would overwhelm the network and be extremely cumbersome to an administrator to review all of the operational metric information to find problems or determine a state of the array. Furthermore, the complexity would be substantially increased for a collection of arrays.