This invention relates to the field of data processing, and, more particularly, to improvements in a method for dynamically making software changes in a running system.
There are commercially available data processing systems such as IBM ESA/390 data processing systems, which operate with many resident programs or modules such as those of the commercially available IBM MVS/ESA operating system. ("IBM", "ESA/390", and "MVS/ESA" are trademarks of International Business Machines Corporation) When a system is running, such resident modules are accessible to each other in many different ways, and multiple tasks and processes can independently access the programs. From time to time, various operating system modules are updated and it becomes necessary to substitute new versions for the old versions. The problem thus exists of how to effect non-disruptive replacement while the system is running and in consideration of the complex environment where one or more different processes are concurrently using the programs being replaced.
The general problem is known and has been recognized in the prior art. A paper, "Change Programming in Distributed System", by G. Etzkorn, International Workshop on Configurable and Distributed Systems, pages 140-151, London, UK, Mar. 25-27, 1992, describes a method of dynamically reconfiguring programs in a system in which the programs communicate by message passing between ports. Reconfiguration occurs only when the system has reached a "reconfiguration state" and stays in such state while the changes are applied or made. The method requires a first series of reconfiguration commands that place the system in the reconfiguration state and then a series of change commands which effect the change. A change is made by reconfiguring an old version out of the system and configuring a new version into the system. The invention differs from such a system in several ways but the major points of distinction are as follows. First, the invention is not based on message passing but upon the use of entry points and safety points and the normal interaction of processes with the programs to be changed. Second, in the invention, both old and new programs may be executed concurrently via multitasking while the system described in the paper completely reconfigures an old program out of the way.
Another paper "Dynamic Program Modification in Telecommunication Systems", by O. Frieder et al., Proceedings of the IEEE SEVENTH CONFERENCE ON SOFTWARE ENGINEERING FOR TELECOMMUNICATION SWITCHING SYSTEMS, pages 168-172, 1989, proposes a solution for a subset of the problems in a distributed telecommunications environment. The updating process described in this paper replaces programs having plural procedures, one procedure at a time. "The updating system interrupts the program and examines the current state of its runtime stack. Based on this information and the list of all procedures that each procedure can call (generated by the language compiler), the updating system calculates when each procedure may be updated. Updating a procedure involves changing its binding from its current version to the new version. When all procedures have been replaced by their new version, the program update is complete." (pg. 169) A "procedure . . . that has changed between versions may be updated only when it is not active." In contrast, the invention updates active tasks and uses entry points and safety points, and does not examine any stack. The invention allows for concurrent execution of the old program and the new program by multiple tasks. Also, the invention does not require interception of every program exit.
The invention involves the use of "safety points" which are system observable events and conditions. These events and conditions control the routing of tasks to the old program or to the new program. One may relate the concept of a safety point in a program to a sync point in a database (DB) transaction. All DB changes must be permanently written in the data base once a sync point is reached, all of them should be backed out if the transaction aborts prior to reaching a sync point, and in some database managers, none of the changes are visible to other transactions until a sync point has been reached. The differences between the DB sync point and the program change safety points are:
Safety points are chosen anew with each change to the program. Sync points usually remain the same, even if the program flow or the database structure changes. PA1 Safety points most often reside in modules which are not being changed, while sync points are often embedded in the program constituting the transaction. PA1 Sync points are either explicit (system call), or trivially implicit (end of transaction implies a sync point). Safety points are explicit, but cannot be observed in the program. They must be specified externally to the program. It is not possible to code a system call saying "This task is now Safe for any change". PA1 Safety points lose their meaning when the change is fully applied or the system is restarted with all new modules. Though the code in and around the safety point continues to execute, it bears no further significance as a safety point. The sync point is part of the ongoing logical significance of the program. PA1 1. The method should handle arbitrarily unstructured code, which may be called concurrently by multiple processes, using any method of call which is physically possible with the underlying machine architecture. PA1 2. The running code (the old version which is being changed) should not have required or otherwise undergone a restructure, a rewrite or other modification in order to position it for the dynamic change at hand. An "ordinary" change should be applied to "ordinary" and existing code with the help of an external facility, and with the help of an administrative process. PA1 3. Process blocking (quiescing) during implementation of a dynamic change must be kept to a minimum. Deadlocks are prohibited.
These differences also apply when comparing the concept of safety points and the concepts related to Checkpoint-Restart, and the points in time when the latter can be performed.