Enterprise computing systems and networked enterprise computing systems continue to grow in scale and in the complexity of their components and interactions. Today's large-scale networked enterprise computing systems and services exhibit complex behaviors stemming from the interaction of workload, software structure, hardware, network traffic conditions, and system goals, such as service level objectives and agreements. Because of this considerable growth in both scale and complexity, system and/or network management issues become complicated.
Typically, a systems management product is employed within a network. Its purpose is to administer and monitor different servers running on the network. For example, a system management application can watch an application server or Web server and can tell things like how much memory they are using, whether one crashed and went down or if the server is up and running normally, etc.
There are typically two main components in a system management product: a system management server component and an agent component. A server component is installed at a server while an agent component is installed at clients of the network. An agent is responsible for monitoring software running on its computer. When an agent has detected new information about the software it is monitoring, it sends data to the system management server. Therefore, all agents in the network need to send messages to the server.
As a network grows, there may be hundreds or thousands of agents running and there is only one server. If all the agents send data to the server at once, either the server will get flooded with messages and cannot handle it or alternatively, one or more agents will starve (e.g., not be able to send messages to the server because other agents are taking up all the server's time).
One typical solution is to build a cluster of redundant servers, in such a way that a subset of agents communicates with one server and another unique subset of agents communicates with another server and the servers aggregate the data between each other. However, such a solution would not be applicable to the situation where there is only one server available.