A management system is typically used to manage (e.g., monitor and control) the operation of ever increasing networked systems and networks of networked systems. A distributed system (e.g., a computer or communication system) generally includes many individual components (e.g., nodes or devices), which may be implemented using both hardware and software elements. The individual devices, and the relationships between them, conventionally define the “topology” of a distributed system.
A management system typically includes a plurality of agents that are assigned to a centralized manager. The agents of the management system are used to monitor, control, and otherwise influence the behavior of the devices or elements of the managed distributed system. These agents may be any suitable software or hardware element that is capable of collecting information, e.g., statistics, about the behavior of a device and/or enacting required changes to the device. Moreover, any number of the components in a distributed system may be associated with one or more agents, although each component for which monitoring and/or control is desired must be associated with at least one agent.
A centralized manager is used to coordinate the operation of the agents in the management system. As is the case with agents, the centralized manager may be any suitable software or hardware element, although it must be capable of performing tasks required (or useful) to monitor or control a distributed system, such as analysis (performance or fault), configuration changes, etc. In many types of management systems, the agents run on or in the same network of the respective network devices they are monitoring and/or controlling while the manager remotely collects information from one or more agents to perform its task as a whole.
It is important to note that the agents are not required to be on the same network as the managed device or on the device itself. The distinction between the manager and the agent is in their functionality (e.g., monitoring, control, or analysis) rather than their location relative to the devices.
A limitation on the performance of management systems has traditionally been size of the network or the system being managed. Large systems, that have components or elements distributed over a wide geographic area, can present an unsustainable computational burden on the management system. One approach often used to alleviate the burden on the management system of a distributed system, and to thus improve scalability, is to create a distributed-architecture management system. In a distributed-architecture management system, a single, centralized, manager is replaced by a plurality of managers, each of which oversees a subset of the agents in the distributed network or system. Each manager is associated with a respective partition or subset of the distributed architecture management system.
Many current solutions use ad-hoc methods, typically involving manual configuration of the management system. Such methods, however, suffer from several drawbacks. For example, the resulting division may not provide an accurate result as each manager needs to have enough information to be able to correlate events in the associated devices managed as well as causally-related devices it may not be managing. For example, a failure of a link may go undetected if the two devices adjacent to the links are assigned to different managers. Secondly, the process is inefficient. In the case of very large networks, with thousands of devices, it is time consuming to assign devices to managers in order to accomplish preset goals. For example, if one wants to minimize the number of devices that need to be assigned to more than one manager, it may be difficult to develop an efficient algorithm to perform an efficient assignment for very large networks. Lastly, the process is not scalable as it is difficult to develop an algorithm that can accomplish preset goals while being scalable in the number of agents.
One solution proposed to overcome the above noted problems is presented in U.S. patent application Ser. No. 11/052,395, entitled “Method and Apparatus for Arranging Distributed System Topology Among a Plurality of Network Managers,” filed on Feb. 7, 2005, the contents of which are incorporated by reference, as if in full, herein. In this proposed solution, network elements or components or agents are assigned to at least one manager and the assignment is iteratively improved until at least one desired criterion regarding the at least one manager is substantially achieved. The improvement upon the assignment is made using a modified Kernighan-Lin algorithm applied to hyper-graphs and multi-partitions.
However, there are situations wherein the proposed modified Kernighan-Lin algorithm may not converge upon a desired solution or may require an excessive amount to time to complete. The former can occur when the initial reference points are not within a region of solutions that fail to converge and the latter may occur when the number of elements, nodes or agents is large.
In view of the foregoing, it would be desirable to provide a fast and reliable method of assigning agents to one or more managers in a distributed-architecture manager system.