The majority of Internet outages are directly attributable to software upgrade issues and software quality in general. Mitigation of network downtime is a constant battle for service providers. In pursuit of xe2x80x9cfive 9""s availabilityxe2x80x9d or 99.999% network up time, service providers must minimize network outages due to equipment (i.e., hardware) and all too common software failures. Service providers not only incur downtime due to failures, but also incur downtime for upgrades to deploy new or improved software, hardware, software or hardware fixes or patches that are needed to deal with current network problems. A network outage can also occur after an upgrade has been installed if the upgrade itself includes undetected problems (i.e., bugs) or if the upgrade causes other software or hardware to have problems. Data merging, data conversion and untested compatibilities contribute to downtime. Upgrades often result in data loss due to incompatibilities with data file formats. Downtime may occur unexpectedly days after an upgrade due to lurking software or hardware incompatibilities. Often, the upgrade of one process results in the failure of another process. This is often referred to as regression. Sometimes one change can cause several other components to fail; this is often called the xe2x80x9cripplexe2x80x9d effect. To avoid compatibility problems, multiple versions (upgraded and not upgraded versions) of the same software are not executed at the same time.
Most computer systems are based on inflexible, monolithic software architectures that consist of one massive program or a single image. Though the program includes many sub-programs or applications, when the program is linked, all the subprograms are resolved into one image. Monolithic software architectures are chosen because writing subprograms is simplified since the locations of all other subprograms are known and straightforward function calls between subprograms can be used. Unfortunately, the data and code within the image is static and cannot be changed without changing the entire image. Such a change is termed an upgrade and requires creating a new monolithic image including the changes and then rebooting the computer to cause it to use the new. Thus, to upgrade, patch or modify the program requires that the entire computer system be shut down and rebooted. Shutting down a network router or switch immediately affects the network up time or xe2x80x9cavailabilityxe2x80x9d. To minimize the number of reboots required for software upgrades and, consequently, the amount of network down time, new software releases to customers are often limited to a few times a year at best. In some cases, only a single release per year is feasible. In addition, new software releases are also limited to a few times a year due to the amount of testing required to release a new monolithic software program. As the size and complexity of the program grows, the amount of time required to test and the size of the regress matrix used to test the software also grows. Forcing more releases each year may negatively affect software quality as all bugs may not be detected. If the software is not fully tested and a bug is not detectedxe2x80x94or even after extensive testing a bug is not discoveredxe2x80x94and the network device is rebooted with the new software, more network down time may be experienced if the device crashes due to the bug or the device causes other devices on the network to have problems and it and other devices must be brought down again for repair or another upgrade to fix the bug. In addition, after each software release, the size of the monolithic image increases leading to a longer reboot time. Moreover, a monolithic image requires contiguous memory space, and thus, the computer system""s finite memory resources will limit the size of the image.
Unfortunately, limiting the number of software releases also delays the release of new hardware. New hardware modules, usually ready to ship between xe2x80x9cmajorxe2x80x9d software releases, cannot be shipped more than a few times a year since the release of the hardware must be coordinated with the release of new software designed to upgrade the monolithic software architecture to run the new hardware.
An additional and perhaps less obvious issue faced by customers is encountered when customers need to scale and enhance their networks. Typically, new and faster hardware is added to increase bandwidth or add computing power to an existing network. Under a monolithic software model, since customers are often unwilling to run different software revisions in each network element, customers are forced to upgrade the entire network. This may require shutting down and rebooting each network device.
xe2x80x9cDynamic loadingxe2x80x9d is one method used to address some of the problems encountered with upgrading monolithic software. The core or kernel software is loaded on power-up but the dynamic loading architecture allows each application to be loaded only when requested. In some situations, instances of these software applications may be upgraded without having to upgrade the kernel and without having to reboot the system (xe2x80x9chot upgradexe2x80x9d). Unfortunately, much of the data and code required to support basic system services, for example, event logging and configuration remain static in the kernel. Application program interface (API) dependencies between dynamically loaded software applications and kernel resident software further complicate upgrade operations. Consequently, many application fixes or improvements and new hardware releases, require changes to the kernel code whichxe2x80x94similar to monolithic software changesxe2x80x94requires updating the kernel and shutting down and rebooting the computer.
In addition, processes in monolithic images and those which are dynamically loadable typically use a flat (shared) memory space programming model. If a process fails, it may corrupt memory used by other processes. Detecting and fixing corrupt memory is difficult and, in many instances, impossible. As a result, to avoid the potential for memory corruption errors, when a single process fails, the computer system is often re-booted.
All of these problems impede the advancement of networksxe2x80x94a situation that is completely incongruous with the accelerated need and growth of networks today.
In one aspect of the invention, a distributed redundancy design is disclosed to minimize network outages and other problems associated with component/process failures by spreading software backup (in the so-called xe2x80x9chot statexe2x80x9d) across multiple elements. In one embodiment, a 1:N redundancy design can be implemented in which a single backup process is used to backup multiple (N) primary processes.
For software backup alone, the distributed redundancy architecture of the present invention eliminates the need for hardware backup elements (e.g., spare hardware). Where hardware backup is also provided, spreading resource demands across multiple elements makes it possible to have significant (perhaps full) hot state backup without the need for a xe2x80x9cmega sparexe2x80x9d (e.g., a more powerful processor and additional memory). Identical backup (spare) and primary hardware provides manufacturing advantages and customer inventory advantages. The distributed redundancy architecture of the present invention permits the location of the hardware backup element to float, that is, if a primary element fails, its functions can be transferred over to the backup element. When the failed primary element is replaced, the replacement hardware can serve as the hardware backup.
In one embodiment, a distributed redundancy system is disclosed that provides software redundancy (backup) with or without redundant (backup) hardware, for example, with or without using a backup line card. In one embodiment, the computer system can employ additional primary line cards. In order to load instances of software applications, the Network Management Software (NMS) creates software load records (SLRs) in a configuration database. A typical SLR includes the name of a control shim executable file and a logical identification (LID) associated with a primary line card on which the application is to be spawned.
For example, one or more Asynchronous Transfer Mode (ATM) protocol controllers are sent records from a Group Table (GT) indicating how many instances of ATM each must start on their associated line cards. The Group Table can include a primary line card and a backup line card such that, in addition to starting primary instances of ATM, each primary line card also executes backup instances of ATM. For example, ATM controller can receive records from a group table including logical identifiers (LIDs). In response to such records, the ATM controller starts a number of primary instantiations of ATM and a comparable number of backup instantiations of ATM as backup for four primary instantiations. Similarly, another ATM controller can receive records from another group table and, in response thereto, start primary and backup instantiations of ATM.
Each primary instantiation registers with its local name server, and each backup instantiation subscribes to its local name server for information about its corresponding primary instantiation. The name server passes each backup instantiation at least the process identification number assigned to its corresponding primary instantiation, and with this, the backup instantiation can send a message to the primary instantiation to set up a dynamic state check-pointing procedure. Periodically or a synchronously as state changes, the primary instantiation passes dynamic state information to the backup instantiation (i.e., check-pointing).
In a further aspect of the invention, a Redundancy Manager Service can be used to allow backup and primary instantiations to pass dynamic state information. If the primary instantiation fails, it can be re-started, retrieve its last known dynamic state from the backup instantiation and then initiate an audit procedure to resynchronize with other processes. The retrieval and audit process will normally be completed very quickly, resulting in no discernable service disruption.
Although each line card in the examples described below can be instructed by the group table to start four instantiations of ATM, this is by way of example only. The user could instruct the NMS to set up the group table to have each line card start one or more instantiations and to have each line card start a different number of instantiations.
If one or more of the primary processes on a particular element experiences a software fault, the processor on the line card may terminate and restart the failing process or processes. Once the process or processes are restarted, a copy of the last known dynamic state (i.e., the backup state) can be retrieved a from corresponding backup processes executing on a line card and initiate an audit process to synchronize retrieved state with the dynamic state of associated other processes. The backup state represents the last known active or dynamic state of the processor processes prior to termination, and retrieving this state from a line card allows the restarted processes on the line card to quickly resynchronize and continue operating. The retrieval and audit process will normally be completed very quickly, and in the case of a network device, quick resynchronization may avoid losing network connections, resulting in no discernable service disruption.
If, instead of restarting a particular application, the software fault experienced by a line card requires the entire element to be shut down and rebooted, then all of the processes executing on the line card will be terminated including backup. When the primary processes are restarted, backup state information can be retrieved from backup processes executing on a second line card. Simultaneously, the restarted backup processes on the first line card again initiate the check-pointing procedure with primary ATM processes executing on another line card serving as backup processes for these primary processes. In addition, each primary process may be backed-up by one or more backup processes executing on one or more of the other line cards.
Since the operating system assigns each process its own memory block, each primary process can be backed-up by a backup process running on the same line card. This would minimize the time required to retrieve backup state and resynchronize if a primary process fails and is restarted. In one embodiment, a computer system is disclosed that includes a spare or backup line card, and the backup state is saved on another line card such that in the event of a hardware fault, the backup state is not lost and can be copied from the other line card. If memory and processor limitations permit, backup processes may run simultaneously on the same line card as the primary process and on another line card such that software faults are recovered from using local backup state and hardware faults are recovered from using remote backup state.
Where limitations on processing power or memory make full hot state backup impossible or impractical, only certain hot state data will be stored as backup. The level of hot state backup is inversely proportional to the resynchronization time, that is, as the level of hot state backup increases, resynchronization time decreases. For a network device, backup state may include critical information that allows the primary process to quickly resynchronize.
Critical information for a network device can include connection data relevant to established network connections (e.g., call set up information and virtual circuit information). For example, after primary ATM applications executing on one line card establish network connections, those applications can send critical state information relevant to those connections to backup ATM applications executing on another line card. Retrieving connection data allows the hardware (i.e., the first line card) to send and receive network data over the previously established network connections preventing these connections from being terminated/dropped.
Although redundant ATM applications are described above, this is by way of example only. Any application (e.g., IP or MPLS), process (e.g., MCD or NS) or device driver (e.g., port driver) may have a backup process started on the same or another line card to store a backup record through the check-pointing procedures of the present invention.