Cloud computing is an outcome of development and integration of traditional computer and network techniques such as Grid Computing, Distributed Computing, Parallel Computing, Utility Computing, Network Storage Technologies, Virtualization, Load Balance, and the like. It aims to integrate multiple relatively low-cost computing entities into a sophisticated system with a powerful computing capability, and distribute the powerful computing capability to each computing entity, that is, to each node. The basic concept of cloud computing actually stems from a distributed computing technique of the past, where both have essentially similar system architecture and adopt essentially similar system management method, both using a distributed equipment management mode. However, with the fast development of distributed computing and cloud computing, they both raise new requirements on the scale, reliability, general applicability, and expandability of the system. The traditional distributed equipment management mode also shows its drawbacks regarding distributed computing and cloud computing.
FIG. 1 is a diagram of a currently common topological structure of distributed equipment management. As shown in FIG. 1, most traditional equipment management modes adopt a preconfigured tree structure, wherein a level number increases level by level from a controlled node at the bottom level to a control node at the top level. A control node 11 manages a controlled node 12 and sends out a control instruction or collects running information; a control node at the top level provides a system management interface to external, receives a control instruction from a client 13, and distributes the received instruction level by level to complete the process of management.
New equipment is added by modifying a physical topological structure of a traditional distributed system of pieces of equipment when the capacity of the system needs to be expanded. Before the access of the new equipment, it is needed to pre-estimate a system status and select a proper node so as to avoid a performance bottleneck. As the number of equipment increases, difficulty in implementing the solution increases correspondingly with insufficient expandability, rendering the solution unsuitable for use in a large scale system.
As the system runs, when a node is found to be overloaded, what can be done is only to replace physical equipment to increase a processing capability, or modify the topological structure manually, which can not provide a fast solution and thus can not ensure system reliability. Furthermore, the whole system may be in an unsupervised state and can not be recovered automatically when a control node at the top level fails.