The present invention generally relates to management platforms used to manage multiple customer networks and specifically, to processes, apparatus, and systems used to construct management platforms consistent with Simple Network Management Protocol (xe2x80x9cSNMPxe2x80x9d) to manage multiple customer networks.
Existing network management tools, such as Hewlett Packard""s Open View Network Node Manager (xe2x80x9cHP""s NNMxe2x80x9d), utilize graphical displays of network components and generally utilize color to relay information. These systems are generally used to manage and control networks, in which they generally provide notification of the status of network elements, particularly, failed elements. Networks are generally comprised of computer communications equipment, including, but not limited to, routers, switches hubs, and servers. HP""s NNM can be viewed as being representative of the architecture and approach used by current commercial network management tools and, thus, is used herein to explain some of the problems with existing approaches.
These existing network management tools have a number of problems. Specifically, the displays are not helpful. Since color (shown as varying grey shades in FIGS. 1 and 2) is used to relay information, alarms can be hidden by an inappropriate color change threshold. In particular, as shown in FIGS. 1 and 2, HP""s NNM maps use shapes that represent collections or managed objects. As shown in FIGS. 1 and 2, each object can be xe2x80x98explodedxe2x80x99 by opening the object until the lowest level is reached. Each aggregate object can have only one (1) of six (6) colors to represent the number of elements grouped together in that aggregate object that are in alarm condition, the color of the aggregate object being determined on a fractional basis Consequently, in certain circumstances, HP""s NNM maps fail to communicate the occurrence of an alarm, as the presentation mechanism fails to relay the information to the user of the system in a way that makes the new failure apparent. For example, it may require the user to open additional windows, which, at a certain point, becomes impractical. At the aggregate object layer, as shown in FIGS. 1 and 2, the overall color of the aggregate object may not actually change color, even though individual elements of a specific aggregate object may fail.
Specifically, FIG. 1 is a typical view of an application of HP NNM, as it appears on an engineer""s monitor, with one alarm and FIG. 2 is a typical view of an application of HP NNM, as it appears on a computer monitor, with multiple alarms. It is difficult to track the number of alarms in both FIGS. 1 and 2, especially in FIG. 2. The upper let-hand sub-window, which is labeled xe2x80x9cIP Internet,xe2x80x9d has not changed colors (or grey shades) in between FIG. 1 and 2, which illustrates how changes can be hidden. The color level did not change with the additional alarm, due to the number of objects represented below the xe2x80x9cIP Internetxe2x80x9d symbol (shown in the sub-windows below) that were not in an alarm condition. Since these maps can be many levels deep, this problem can occur at any level. Additional sub-windows must be opened to avoid the averaging problem, which makes the overall display extremely crowded. Similarly, new alarms in existing systems can be hard to see or detect. Even if the change of status in an individual element does, in fact, change the color of the aggregate object, the change in color can be hard to detect on the display. For example, displays used in these modem systems are typically filled with numerous colored objects and the operator may not notice one more colored icon.
Also, information displayed by modem systems are difficult to relate or otherwise view. Particularly, the objects used in these modem systems are capable of relating only a limited amount of textual data. For instance, please refer to FIGS. 3 and 4. FIG. 3 is a typical view of HP""s NNM, as it appears on a computer monitor, showing external data capabilities. FIG. 4 is a typical view of HP""s NNM, as it appears on a computer monitor, showing internal data capabilities. A right click via a standard mouse on a symbol will bring up a menu of options, one of which is to view/modify the object description, but not the relative size of the comments section. This dialog box presents an opportunity to record some relevant external information about the symbol that is reporting the alarm, but, unfortunately, the opportunity is effectively wasted, since it is extremely difficult and time consuming to enter each field by hand and only one or two pieces of information can be shown at a time. For each device, several entries would be required and there may be 1000""s to 100,000""s of devices. Typically, the label for an object is generated by the HP""s NNM application and is indicative of some data internal to HP""s NNM and is not related to any external data such as city name or device name.
Furthermore, applications using existing systems are difficult to administer, as the preferred tools are complex and typically require specialized training just to operate the tool. Moreover, scalability is questionable and expensive, as there is a limit to the size of network that HP""s NNM can manage, and even for small networks ( less than 500 sites) the hardware and software licenses are expensive. Finally, modem systems are slow and limited in the total number sites that can be reviewed. For instance, actual embodiments of NNM has not been shown to work reliably for more than 500 sites. Actual embodiments of HP""s NNM took from fifteen (15) minutes to hours to display information about failed devices and stopped functionally about once a week.
Existing designs and procedures have other problems as well.
Preferred embodiments pertain to an apparatus and related methods and systems that generally manage networks. Note that preferred methods are preferably performed by the preferred apparatus and systems and are discussed in reference to the preferred apparatus and systems.
Preferred embodiments generally implements the following procedure to operate preferred systems: (i) the SNMP Poll application loads from a database a list of interfaces to be monitored; (ii) the SNMP Poll sends out SNMP and tracks responses to determine which interfaces are reachable and which are not; (iii) if the SNMP Poll fails to reach an interface two (2) consecutive times, a message is sent to server; (iv) the server checks the interface for a total of ten (10) more times and, if the interface replies six (6) or fewer times to the ten (10) requests, an alarm is generated, and, if the interface replies seven (7) or more times to the ten (10) requests, a message is sent back to the SNMP Poll and the interface is placed in the poll queue; (v) the server generates an alarm, if necessary, by associating information from the OSS database with the interface address; (vi) the server distributes the alarm by sending an alarm message to all attached display devices (e.g., a display server and client); (vii) a client can display the alarm information in a hierarchical tree structure; and (viii) the server monitors the interface to determine when the interface become reachable again and generates a clear message which is formatted and sent to the clients and the server then sends a message to the SNMP Poll to return the interface to the poll queue.
Preferred embodiments are used to manage a network by monitoring at least one interface of the network and are generally comprised of a poller, a server, a database, and a client applications module. The poller, server, database, and client applications module are in communication with each other. The poller is in communication with at least one interface of the network. The poller continuously checks at least one interface of the network by continuously sending out a poller query message to at least one interface of the network. The poller sends out the poller query messages to at least one interface in a regular, continuous manner. The poller monitors the responses, if any, received from at least one interface to the poller query message. The poller suspects a first interface of the at least one interface of failing when the poller does not receive a poller reply message in response to the query messages from the first interface within a first time period. The poller continues to monitor the first interface to determine if and when the first interface becomes reachable again and the poller generates a clear message to the server which is formatted and sent to clients and the server then sends a message to the poller to restart sending the poller query messages to the first interface.
The poller sends an alert signal to the server notifying the server that the first interface of at least one interface may be failing when the poller suspects the first interface of the at least one interface is failing. After receiving the alert signal, the server sends out at least one server query signal to the first interface, which the server monitors to determine whether the first interface replies to at least one server query signal by sending at least one server replay message. The server sends out the poller query messages to at least one interface in a regular, continuous manner. The server evaluates at least one server replay message to determine whether the first interface is failing by sending out a first number, such as ten (10), of the server query signals to the first interface and further wherein the server determines whether the first interface is failing by counting the poller responses received and if the poller responses are above a minimum number, such as seven (7), then the server determines that the poller must be failing.
The database contains information concerning at least one interface of the network. When the server determines the first interface is failing, the server pulls first information concerning the first interface and sends an alarm signal with the first information to client applications modules. The database also stores alarm information comprised of information about the alarm signal and the server stores the alarm information about the alarm signal in the database.
The server communicates with the client applications module via a display server, the display server receives the alarm signal and the alarm information, organizes the alarm information, and presents the alarm signal and the alarm information to the client applications module. The client applications module displays the alarm information in a hierarchical tree structure.
Preferred embodiments provide a number of advantages. With respect to the operation of the preferred embodiment, preferred embodiments adopt or utilize a distributed architecture, which can be extended over several machines and multiple processors. Preferred embodiments also utilize parallel operation of various features and functions, so that multiple, parallel outbound queues can be used to optimize polling efficiency. Specifically, preferred embodiments are able to effectively touch or access every interface in a customer base in less than one (1) minute, allow a maximum of 1250 simultaneously outstanding requests, poll at a rate of up to 120 interfaces per second per SNMP machine. Preferred embodiments adopt randomized outbound polling, so as to provide even loading to customer/carrier networks. Preferred embodiments are easily integrated with a database (i.e., Oracle Database) and can adopt a client-server model and can be used with multiple clients. Preferred embodiments are scalable, such that preferred embodiments are cable of monitoring many systems and many customers simultaneously. The architecture of preferred embodiments allows for multiple SNMP polling machines and allows an extended interval (i.e., 8000 ms) for return of a response. Preferred embodiments run on a xe2x80x98low levelxe2x80x99 hardware platform. Preferred embodiments allow updates of the underlying database, while the system is in operation. In contract to currently available commercial applications, preferred embodiments are functionally focused providing maximum performance in a narrow functional area. Preferred embodiments make the first call in reference to the loss of contact and then pass those locations to a separate Investigation Queue.
With respect to the presentation provided by preferred embodiments, an alarm is generally viewed as a notification that something is broken. Consequently, preferred embodiments associate an alarm condition with other pertinent information, such as the physical address of the device in addition to its network address and contact information, such as telephone numbers and names of local operators. This information is presented in two (2) ways: (i) in a hierarchical tree structure to relay the current state of the entire xe2x80x98managed network spacexe2x80x99 and (ii) in a table structure to relay an historical view that describes a recent event. As a result, the presentation found in preferred embodiments are convenient and timely. For example, preferred embodiments provide the following types of information: (i) xe2x80x9cSpringxe2x80x9d to determine connectivity; (ii) xe2x80x9cSqueryxe2x80x9d to gather basic SNMP statistics; (iii) xe2x80x9cDynamic Un-Managexe2x80x9d to un-manage a client interface; (iii) xe2x80x9cDynamic Re-Managexe2x80x9d to add an interface to the managed list; (iv) xe2x80x9cAutomated NetRep Loadxe2x80x9d to load the database; (v) xe2x80x9cInterface Reportsxe2x80x9d to determine extent of managed devices; (vi) xe2x80x9cEvent Reportsxe2x80x9d to summarize activity by site, customer, and date range; (vii) xe2x80x9cOn Demand Statisticsxe2x80x9d to manage interfaces, sites by Group and Team; (viii) xe2x80x9c2 Part Displayxe2x80x9d to show hierarchical and historical information pertaining to the network; (ix) xe2x80x9cTeam Delivery of Alarmsxe2x80x9d to allow a user to choose a view a team and/or group; (x) xe2x80x9cControl Eventsxe2x80x9d to automatically un-manage and subsequently re-manage network interfaces at pre-specified times; and (xi) xe2x80x9cDisplay Serverxe2x80x9d to relay messages between multiple client applications and server applications, as it shares the load and relieves the server of some of the communications tasks. Display servers used in preferred embodiments allow the connection of sixteen (16) clients per display server and a total of approximately 128 clients or more. The architecture also allows fewer resources on the server than on an architecture having all clients attached directly to the server.
Other advantages of the invention and/or inventions described herein will be explained in greater detail below.