As distributed processing systems have become more complex, customer requirements for system availability have also become more stringent. Distributed processing systems includes multiplicities of nodes (e.g. on the order of hundreds to thousands), each node including a processor and various support modules. A distributed processing system can require multiple megabytes of control code to enable efficient functioning of the system. As function is added to such a system, control code sizes can grow to tens of megabytes. Such large control codes invariably require changes, updates, alterations, etc. If a distributing processing system is required to be placed out of service each time a code change is installed, a customer's use of the system is significantly disrupted.
The prior art describes various techniques for enabling installation of an update to a control code while maintaining some level of system operability. U.S. Pat. No. 5,155,837 to Liu et al. describes a time-shared multi-processor system wherein either application programs or operating system programs can be retrofitted without service interruption. Processors in the system are divided into two logical partitions. The old version of the software runs in one partition while the new version is loaded into and started up in the other partition. When the new version is verified to be properly operating, data traffic is transferred from the old version partition to the new version partition in two steps. First, the input data is switched to the new version. When the transactions in process in the old version are all completed, the output data is switched from the old version to the new version.
U.S. Pat. No. 5,210,854 to Beaverton et al. describes a system for updating programs stored in a programmable read only memory. During an updating procedure, a new version of a sub-routine is stored in a free area of the programmable read only memory. Such storage occurs after a control device has partitioned the firmware resident in the programmable read only memory to prevent writing to protected partitions of the system's firmware. Transfer vectors are used to provide indirect addressing of sub-routines resident in the firmware. After the updated version of a subroutine is stored, the transfer vector pointing to the old version of the subroutine is updated to indicate the new version.
In sum, microcode changes in older processing systems generally necessitated the shut down of the machine, resulting in customer disruption. In more recent products, where there are two identical clusters of nodes or machines, a microcode change is activated one at a time in each cluster. When one half of the system is being updated, the other half is operating independently at the same time. However, communications between the two sides of the system are disconnected when the two sides are at different change levels.
It is accordingly an object of this invention to enable code updates in a multi-nodal system, wherein communications between nodes that are at different levels of code change are enabled.
It is another object of this invention to provide a multi-nodal system with apparatus for installing control code updates, wherein the multi-nodal system continues operation during the installation process.
It is yet another object of this invention to provide a method and apparatus for installing code updates on a multi-nodal system wherein code revisions are installed in accordance with a predetermined sequence.