Communications services are provided over a managed infrastructure of managed communications network nodes and interconnecting links.
FIG. 1 is a schematic diagram showing interconnected network elements implementing connected communications networks.
Network nodes 102, 102-A, 102-B are physically interconnected via physical links 104 in communications networks 100. Communications networks 100 may be bridged via bridge network nodes 106 to enable data content exchange therebetween. Connected communications networks 100 can be grouped defining areas of focus and influence for the purposes of network management, known as network partitions 108.
All data network equipment is subject to design choices which are bound to differ from vendor to vendor. For example, as shown schematically in FIG. 1, an equipment vendor may chose to implement an integral network node device 102-B having a switching processor and a group of ports 110. Another equipment vendor may chose a customizable implementation of a network node 102-A including: a switching fabric, an equipment rack divided into shelves, each shelf 120 having slot connectors 122 for connection with interface cards, each interface card 124 having at least one port 110. The two network nodes 102-A and 102-B provide the same switching function. The network node 102-A is better adapted to provide high throughput.
A exemplary containment hierarchy 200 of managed network entities, shown in FIG. 2, is maintained for network management purposes. Each managed network entity instance in the containment hierarchy 200 corresponds to an installed physical managed entity or a defined logical managed entity in the realm of influence. Exemplary physical managed entities include, but are not limited to: physical links 104, physical ports 110, interface cards 124, shelves 120, network nodes 102, etc. Exemplary logical managed entities include, but are not limited to: network partitions 108, link groups 204, logical trunks 206, logical ports 210, etc.
Typically link groups 204 are used to provide inverse multiplexing. A link group 204 is typically defined to include a group of physical links 104 used in combination to convey content at the aggregate bandwidth of the group of physical links 104. The group of physical links 104 in the link group 204 connect to a corresponding group of ports 110 associated typically with an interface card 124 providing inverse multiplexing functionality. The corresponding group of physical ports 110 define a logical port 210. In conveying content, a data flow may be routed onto a link group 204, the inverse multiplexing interface card 124 distributing the data flow bandwidth over the individual physical links 104 in the link group 204. From a service provisioning perspective, each physical link 104 in the link group 204 represents a potential hop in a route for a prospective connection path independent of all other physical links 104 in the link group 204.
Typically logical trunks 206 are used to provide redundant content transport. Each logical trunk 206 is typically defined to include at least one designated active physical link 104, actively used for conveying content, and at least one designated standby physical link 104, reserved to convey content in the event that the associated active physical link 104 experiences a failure. Typically the physical links 104 in a logical trunk 206 connect to physical ports 110 on different interface cards 124 to provide redundancy. The corresponding group of physical ports 110 define a logical port 210. In conveying content, a data flow may be switched to the logical port 210, the combination of interface cards 124 cooperating to direct content transport over the active physical link 104 or the standby physical link 104 dependent on the operational status of the designated active equipment (physical link 104, corresponding physical port 110, corresponding interface card 124, etc.)
Network management is concerned, at least in part, with monitoring managed communications network equipment to ensure adherence to a defined communications network state. Reported alarms provide information regarding departures from the defined communications network state. And, fault management includes attending to alarms in an attempt to restore the managed communications network to the defined network state.
The definition of the communications network state includes configuring operational parameters associated with managed communications network equipment to operate in a desired fashion. A Network Management System (NMS) 230 is used to interact with the field installed communications network equipment either directly or indirectly via interaction with communication network entity instances in the containment hierarchy 200. Alarm information is reported to the NMS 230 and status registers associated with the corresponding communications network managed entity instances in the containment hierarchy 200 are updated accordingly.
A network management system such as an Alcatel 5620 NMS implements a network management tool 300 for interacting with the containment hierarchy 200 to provide an operator typically with a visual display of the managed communications network state.
A person familiar with network management understands that the amount of configuration, status, and alarm information maintained via the containment hierarchy 200 is so great that it cannot possibly all be displayed on an NMS console, not even for what today is considered a simple communications network 100. The network management tool 300 extracts: managed entity instance identifiers, associations between managed entity instances, and managed entity states from the containment hierarchy 200. The network management tool 300 filters the managed entity identifiers and the associations between the managed entity instances to display a summary high-level view of interconnected managed communications network entities referred to as a network map 310. Typically the network map 310 displays high-level managed entities such as network node and interconnecting links. The network management tool 300 processes the extracted alarm information to derive summary operational states for the displayed managed communications network entities. The derived summary operational states are also displayed in the network map 310. On a continuing basis, the network management tool 300 scans the containment hierarchy 200 and/or is provided with managed entity status change information to update the network status displayed via the network map 310.
The portions of the network map 310 displayed, may be viewed at different levels of complexity by interacting therewith. High-level views of the network map 310 combine groups of associated managed communication network entities under single iconical representations thereof. For example, physical links 104 associated with either logical link groups 204 or logical trunks 206 are not shown, but rather the logical link group 204 and/or the logical trunks 206 are iconically shown as interconnecting links. Ports 110, logical ports 210, interface cards 124 and shelves 120 are not shown, while communication network nodes 102 are shown as icons. Summary alarm information is typically concurrently displayed via predefined characteristics, typically color, ascribed to the iconical representations.
For purposes of effecting network management, it is imperative that all alarm status information received is used to update the current network state and that departures from the desired operation are displayed in the network map 310 as exemplarily shown in FIG. 3. Low-level alarm information reported by managed field installed equipment is propagated to high-level managed entity instances along the managed entity associations specified in the containment hierarchy 200, and displayed via the corresponding high-level iconical representations thereof in the summary network map 310 view. For greater certainty, the most severe state is always propagated to the displayed high-level managed entities to enable troubleshooting of failed infrastructure. Therefore the high level network map 310 displayed enables macro-management of the managed infrastructure in the realm of management.
It is instructive to note that alarm information may not necessarily be reported by the failed equipment itself, although the possibility is not excluded, as the failed equipment may not be operational to the extent to which alarm information reporting is possible. Typically, managed entities associated with the failed equipment report alarm information. It is further understood that a failure experienced by a single managed entities may cause managed entities associated therewith to also experience failure. For example, an interface card 124 having a port 110 experiencing a failure, may still be operational to the extent that the interface card 124 is able to report the failure in generating alarm information, while the physical link 104 associated with the failed port 110 also experiences a failure by virtue of not being able to convey content to the failed port 110. Managed equipment associated with the other end of the affected physical link 104 will also report that the affected physical link 104 is experiencing a failure.
In accordance with a typical service provisioning scenario shown in FIG. 3, a user operating a host network node 302 seeks services provided via a server 312. Both the host network node 302 and the server 312 employ content transport services of the communications network 100. The host network node 302 is connected to network node 102-H and the server 312 is connected to the network node 102-S. The network map 310, presented in FIG. 3, shows seven instances of infrastructure failures in the communications network 100.
To reveal the managed entities experiencing a failures, and from which the alarm information was inherited, an analyst uses the NMS 230, typically by interacting with the network map 310, to navigate the containment hierarchy 200 to expose underlying managed network infrastructure to a greater and greater detail. Inspecting low-level managed network entity records enables micro-management thereof.
The network state shown in FIG. 3 displays high-level affected managed equipment via an exemplary network management color-based failure reporting scheme. In particular the color “red” is used to indicate of equipment failures whereas “green”, for example, would indicate that the managed equipment is performing in accordance with the current configuration of respective operational parameters. Other failure severity reporting schemes may be employed including the use of audible signals.
In employing the network management failure reporting scheme, if just one physical link 104 in a link group 204 is “unavailable” or if one of the active and standby links 104 in a logical trunk 206 is “unavailable”, the corresponding high-level link group 204 or logical trunk 206 managed entity in the containment hierarchy 200 inherits the “unavailable” status and the high-level entities are shown in the network map 310 in red. The network management failure reporting scheme certainly applies to failures experienced by all managed entities and therefore network nodes 102 may be shown in red if sub-components such as, but not limited to: a shelf 120 or the corresponding switching fabric, are experiencing failures (loss of power for example). Therefore all alarms are treated the same and propagated to the high-level network status view 310.
In particular three high-level interconnecting links associated with network node 102-H are shown in red in FIG. 3. By interacting with the high-level interconnecting link iconical representations thereof, details of the corresponding low-level infrastructure is revealed. Individual physical links 104 from which the “unavailable” status was inherited are shown in exploded views.
In accordance with the network management failure reporting scheme, the alarm information received is not qualified to further address service provisioning aspects. The proviso being that, if the network management function provides a fully operational network infrastructure, then the service provisioning function can always succeed in establishing connections—further assuming available bandwidth. Although network infrastructure failures including: high bit error rates, large numbers of data segments being dropped/lost, scheduled maintenance, etc. may affect the quality of service provided, none of these have a terminal impact on service provisioning.
Certainly as long as a single physical link 104 in a link group 204 or a logical trunk 206 can convey content, connections can be established if there is available bandwidth. The three high-level links associated with the network node 102-H and shown in red, do have at least one operational physical link 104 as part of the respective link groups 204 and the logical trunk 206.
As the propagation of the alarm information to high-level managed entities marks the high-level managed entities as “unavailable”, and if the network state displayed 310 is used to direct connection establishment in service provisioning, a significant portion of the high level managed infrastructure may be marked as “unavailable” when in fact only sub-components thereof are affected by the corresponding experienced failures. Consequently, connection route trace diagnostics tools used for troubleshooting connections, in processing information stored in the containment hierarchy 200, find the high-level managed entities “unavailable” and therefore report “unavailable resources” errors.
All alarms reported can be addressed by the network management function, and certainly if all failures were attended to, then service provisioning would not encounter “unavailable resources” conditions. However, the complexity of present day communications networks 100 has increased to such an extent, that attending to all reported alarms is such an involved procedure that the ensuing service provisioning downtime may be overly long.
There is a need to assess network infrastructure failure severity from the point of view of service provisioning.