The demand on packet-based communications networks continues to grow. In order to provide better performance, many network nodes such as routers and switches have been designed with distributed architectures that include an integration of independent computer systems that are optimized to perform specific tasks. For example, a chassis-based router or switch may include a control module card that provides central control functions and multiple port interface cards that provide the interface to other network nodes. The control module and port interfaces are each independent computer systems that have their own central processing unit (CPU), operating system, and software applications. The software applications that are supported by the respective CPUs “reside on” the respective independent computer systems and include applications such as Layer 2 (L2) management, Layer 3 (L3) management, link aggregation control protocol (LACP), spanning tree protocol (STP), multiprotocol label switching (MPLS), etc. The applications that reside on the independent computer systems of the network node are themselves independent applications that can operate independently of other applications. That is, the applications operate independently of other applications that reside on the same independent computer system and independently of other applications that reside on different independent computer systems.
Although each application is a stand-alone application that can operate independently of other applications, many applications that reside on different independent computer systems of a network node rely on each other for information. For example, L3 management applications residing on the control module and the port interfaces exchange routing information that is used to learn routes and mange routing tables. Network nodes with distributed architectures use an interprocess communications (IPC) protocol to communicate between applications that reside on the different independent computer systems. For example, these network nodes often use a message-type IPC protocol that relies on formatted messages to exchange information between applications. For a message-type IPC protocol to be successful, it is necessary that the applications use IPC message structures that are compatible with the active versions of an application.
In network nodes with distributed architectures, there may be situations when it is desirable to upgrade an application that resides on one of the independent computer systems without upgrading the corresponding applications that reside on the other independent computer systems. However, in most network nodes with a distributed architecture, application upgrades are an “all or nothing” proposition. That is, to maintain the compatibility between corresponding applications that reside on the different independent computer systems of a network node, all instances of the application must be upgraded together to maintain compatibility and prevent any of the applications from crashing due to incompatibilities between the application versions. For example, upgrading the L3 management task that resides on one port interface triggers a need to upgrade the corresponding application that resides on the control module, which in turn triggers a need to upgrade the corresponding application that resides on the other port interfaces. Because of the domino effect that is triggered by a single application upgrade, it is difficult to implement an application upgrade on one of the independent computer systems without experiencing some network down time.
One technique that has been used to avoid network down time as a result of application upgrades involves providing complete redundancy for each of the independent computer systems. Complete redundancy for each of the independent computer systems allows application upgrades to be performed in the background on all of the redundant systems. Once an application upgrade is complete on all of the redundant systems, the redundant systems, which include the upgraded applications, are placed into service as the primary systems. Although complete redundancy enables application upgrades to be achieved without incurring network down time, providing complete redundancy is costly and does not provide much flexibility.
In view of this, what is needed is a technique for managing applications in a network node with a distributed architecture that avoids down time and that provides flexibility with application upgrades.