1. Field of the Invention
The present invention relates to a method and system for recovering data from a network element at a network management system (NMS), such as at NMS restart, or at re-establishment of communication after loss of communication with the NMS.
2. Description of the Related Art
As telecommunications services have proliferated, telecommunications networks have become increasingly complex. Today, telecommunications networks, using technologies such as Synchronous Optical Network (SONET), Dense Wavelength Division Multiplexing (DWDM), Asynchronous Transfer Mode (ATM), Ethernet, etc., may extend world-wide and may include thousands of network elements (NEs). Typically, such networks include network management servers (NMSs) that provide the capability to manage, provision and maintain the thousands of network elements. Often, there is only a single server in such a network. Problems arise when this server experiences a failure, either due to hardware or software failure on the server, or due to loss of communications by the server. In order to recover from such a failure, the NMS must re-establish its management control over all of the thousands of NEs in the network.
Typically, NE recovery includes three basic operations:                NE Login: The NMS must establish connectivity with the NE and login to open a session on the NE.        Alarm/Fault Recovery: The NMS must fetch active NE faults and reconcile the NE faults in NMS database.        NE Data Recovery: The NMS must fetch NE configuration data and update the NMS database.        
These three basic operations, however, have significantly different costs in terms of time, NMS computing load, and network traffic load. For example, the cost of various recovery operations:                NE Login: A lightweight operation. Cost is very small.        Alarm/Fault Recovery: Fairly lightweight operation. NE usually has few active faults. Cost is fairly small.        NE Data Recovery: Heavyweight operation. Cost is very high (depends on the size of the NE configuration data).        
NE recovery must be executed in the case of NMS restart after hardware or software failure of the NMS, upon re-establishment of communication with the NE after loss of communication with the NE, and upon re-login into the NE after NE logout. NE data recovery is a very costly operation. Although NE Login and Alarm/Fault Recovery are relatively lightweight operations, involving relatively short times to perform and consuming relatively few network resources, NE Data Recovery is a heavyweight operation. In order to perform NE data recovery, the NMS must fetch NE configuration data, process it and persist it in the NMS DB. As each NE may have a relatively large amount of data to be recovered, the cost to perform NE data recovery on even one NE can be high. As the NMS needs to recover the NEs as a result of a number of situations, such as NMS Restart after hardware or software failure, NE Communication Re-establishment after loss of communications, and Re-login into the NE after NE logout, the need to perform recovery of the network may arise more frequently than desired. In cases where recovery of multiple or all NEs in the network must be performed, such as at NMS restart or communication re-establishment, the cost to perform NE recovery increases. As networks may include thousands of network elements, it is seen that the cost of network recovery may be very high.
In addition, since the basic (and the most important) NE operations cannot be executed until network recovery has completed, customer availability is significantly reduced.
A typical prior art network recovery process is shown in FIG. 1. Such a prior art recovery solutions implements a depth first recovery approach. With such an approach, the elapsed time before the NMS can start performing network monitoring can be very long. All steps of the recovery for an NE must be completed before the recovery of the next NE can start. In particular, NE Login 1, Fault Recover 2, and NE Data Recovery 3 must be completed for NE-1 before any recovery steps can be performed on NE-2. Problems arise with this approach in that network fault monitoring can start only after all the NEs in the network have been recovered. In addition, in some network management systems, if the NE has not been recovered or if NE recovery is in progress, no operations are allowed on the NE.
A need arises for a technique by which the costs of network recovery of multiple NEs, such as at NMS restart, or at re-establishment of communication after loss of communication with the NMS, can be reduced. In addition, a need arises for a technique by which the elapsed time before the NMS can start performing network monitoring can be reduced.