1. Field of the Invention
The present invention relates to computer systems with high availability, and concurrent patches of computer programs. More particularly, a method, a computer implemented program and a device for replacing a computer program with a replacement version concurrently with its execution by a first instance of an operating system in a computer system are provided.
2. Description of the Related Art
Methods for concurrently patching computer system firmware have been proposed in US 2006/0242491 A1 and US 2007/0006201 A1. With these methods only a single firmware thread can be used on a processing unit without pre-emption of threads. Such firmware thread returns to a dispatcher before the actual patch processing starts. Stacks are thus empty and no function of a firmware thread is executed. This requires the program to be designed to frequently return control. Further, the allocation of thread data needs a persistent area of the main memory of the computer system. This main memory area is not be erased during the concurrent patch. Therefore, these methods are not applicable for multi-threaded operating systems with pre-emption of threads, for which a dynamic adaptation of thread stack content and the re-calculation of execution pointers for each function of a specific thread is required.
Another approach is described in a paper by the Massachusetts Institute of Technology titled “Ksplice: An automatic system for rebootless Linux kernel security updates” and authored by J. B. Arnold. There the current version of a function is patched by replacing the first instruction in the function with a function call to the new version of the function. This approach provides only limited support for semantic changes or the introduction of new functions in the patch.
Kexec is a technique to replace an operating system kernel, which is used in Linux® distributions such as SUSE® Linux®. Kexec allows omitting the BIOS (Basic Input/Output System), which is used in many computer systems. But even with kexec all the other steps of an operating system boot procedure need to be performed, which add significant latency to a patch procedure.
Several methods to reduce the latency caused by an operating system boot procedure have been proposed. For example, in most modern operating systems with advanced power management a power-saving state is supported, wherein all hardware is powered off except for the main memory in order to save energy. When a computer system performs a suspend-to-RAM operation, the operating system stops applications, drivers and kernel in order, and stores all necessary information in the RAM (Random Access Memory) used as main memory of the computer system. When the computer system resumes, it retrieves operating state from main memory and restores the whole system to the state when it was suspended.
As described in J. Sun et al “Supporting Multiple OSes with OS Switching”, Proc. of the 2007 USENIX Annual Technical Conference, pp. 357-362, such suspend/resume techniques can even be used to support the concurrent execution of multiple operating systems on a single computer.
High availability solutions also exist, wherein multiple computer systems and/or multiple logical computer systems partitions are used. An example of such solution is the VMware® Update Manager. In such solutions a second instance of an operating system is either executed in parallel or gets booted before the migration from the first instance takes place. Therefore, additional resources are needed for the second instance as well as some form of synchronization between the instances.
A combination of high-availability and suspend/resume techniques exists in VMware® VMotion, which allows migrating a virtual machine from one server computer system to another. Here, the entire state of a virtual machine is stored by a first server on a shared storage, from which it is used by a second server to establish a second instance of the virtual machine. VMotion keeps the transfer period imperceptible to users by keeping track of on-going memory transactions in a bitmap. Once the entire memory and system state has been copied over to the second server, VMotion suspends the source virtual machine, copies the bitmap to the second server, and resumes the virtual machine on the server. Since this approach requires virtual machines, it is not applicable for operating systems executed directly on the physical computer system or a (logical) partition of the computer system. Further, a concurrent patch of the virtual machine management software needs to be handled differently.