As computer networks grow and expand it is important that all elements of the network operate in a coordinated fashion. One of the important steps in this process is to ensure that software on the various network elements is operated and updated in a coordinated manner. The problem of updating pre-existing, region-dependent software without affecting the region-dependent nature of the software and transporting the updated software to the destination (e.g., via the internet), extracting, loading and merging the updated software has been recognized, for example, by Randall in U.S. Pat. No. 5,978,916. This patent teaches a method, system and computer program for updating software with a common update module.
Certain networks require more than a coordinated software update. For example, communications networks have to operate with minimal downtime for administration and maintenance. When system files or operating software is being updated, the network element has to maintain full capability of transporting communication traffic and ensure minimum interruption in administration and maintenance capability. This is a difficult task, since operating software consists of files that are write-protected or access-locked to avoid accidental overwriting during routine operation.
In U.S. Pat. Nos. 6,199,203; 6,154,878 and 6,202,205 Saboff et al. teach memory management techniques for on-line replaceable software, e.g., a software library, such that the state of the software component is preserved after an update to the software component. This is accomplished by allocating two types of memory: transient memory and enduring memory (to be preserved between two calls of the library). In this method, when new version of the software is updated software from the transient memory is released, while the enduring memory is preserved for use by new software versions. In U.S. Pat. No. 5,764,989 Gustafsson et al. teach an interactive program or software development system which obviates the need to halt execution of a program under development or during a maintenance update to correct programming errors. Unfortunately, Saboff's technique is limited to the patching of memory and it cannot be applied to upgrade operating software in a communications network and Gustafsson's teaching cannot be extended to updates of operating systems with the above-mentioned interruption requirements in communications networks.
In fact, updating of system software in a network challenges the management of operating systems as well as operational continuity, memory management and data recovery. In U.S. Pat. No. 5,715,462 Iwamoto et al. present an updating and restoration method of system file that is designed for operating system (OS) updates and takes advantage of storing the same OS in separate memory areas. The executing OS in the first memory area is terminated and the OS in the second area is initiated. After the system files stored in the first area are released from access lock, substitute files provided in advance by using a file replacing function of the second OS replace them. When such as file replacement fails for some reason, the original operating system files are immediately restored.
Iwamoto's teaching moves a long way to solving the problem of upgrading or updating of operating system software and can be applied in communication networks. It offers safety in that it preserves files to provide for recovery and reinitiating of old operating software in case of failure. Unfortunately, Iwamoto's approach has several drawbacks. First, there is a lengthy period of loss of visibility to a network manager. This is the time involved in performing two terminations and activations or two reboot operations and new software installation. In a success scenario this time can be about 15 minutes, and close to one hour in a worst-case failure scenario. Second, this update method has poor failure handling capability with respect to detecting the condition of the system and reporting alarms. Since the application software cannot be started during the procedure, it is not possible to use the alarm mechanisms provided by the application software. Third, implementation and testing are complicated in this approach. The combinations of failure cases during the reboots can be dramatic and cause enormous increases in implementation and testing time.
Therefore, the problem of rapid, simple and effective operating system updates in networks with minimal loss of visibility to a network manager remains unsolved. This problem is especially acute in communications networks that have to maintain high visibility and error-free operation.