A managed network (e.g., an enterprise network) often includes a large number of machines and devices configured to perform a wide variety of functions. The amount of computing assets and the amount of data generated and used by these computing assets scale rapidly with the size of the network. System and resource management on a network, such as collecting real-time information regarding systems and resources in the network and dynamically modifying and reallocating resources and data in the network, requires a substantial amount of computation and communication resources.
In a centrally managed network, a central management server is responsible for issuing requests (e.g., requests for status updates, system management operations, and network management operations, etc.) to the targeted destination nodes in the network. These requests often take a long time to propagate through the network to the appropriate destination nodes. These latencies make real-time management of the machines in the network difficult. For example, it typically takes more time to collect information about the status of machines coupled to the network than it takes for that status to change. Frequently, by the time the requested status information is received by an administrator, such information has already become outdated. In addition, in a centralized managed network, the central server can quickly become overwhelmed by the communication load and becomes a management bottleneck. Furthermore, a centralized management scheme is expensive to implement and maintain.
Some conventional systems attempt to ameliorate the problems of a centralized management scheme by performing some degree of aggregation or processing of data at intermediate control levels, resulting in a hierarchical management structure between the network administrator and the end nodes. These systems also do not scale well. For example, for a network with 100,000 nodes, it may still take several hours or more to report the status of those individual nodes, or even of an aggregate thereof. In that timeframe, many nodes would likely have changed their status, making the status report obsolete. In addition, these hierarchical management structures themselves are difficult and complex to create and maintain, and are prone to problems and failures.
Other conventional systems amass information about network devices into one or more relatively large databases, so that network operators can query those databases for information about devices in the network. These systems also do not scale well. A relatively large network would produce enough data to swamp the operations of a database. One likely consequence is that only a small number of database queries can be made within resource limits of the database or its servers. Another problem with these systems is that their data tend, by the time answers are aggregated, not to reflect the true state of the devices in the network, and, because data is collected over time, the data no longer represent a consistent, snapshot view of those devices.