1. Field of the Invention
This invention relates to the field of autonomic computing and operating systems. More specifically, this invention relates to a mechanism for hot-swapping code managing operating systems resources while providing continuity of that service to applications.
2. Background Art
Operating systems are large and complex, expected to run across a varied set of hardware platforms, stay up for long periods of time, stay up to date with the latest fixes, and serve an increasingly divergent set of application workloads. In the past, operating systems programmers have used several approaches to attempt to achieve some of these goals. Linux, for example, supports the ability to shut down a driver, load a new one, and start it up without having to reboot the system. Microsoft Windows created a Hardware Abstraction Layer to support running across different platforms. Programmers have used adaptive algorithms to attempt to adjust to varying application workloads. For some specific applications, programmers have been able to specially design hot-swapping approaches. None of these solutions address all the issues, and even the issues they do address are not solved for the general case.
One approach, mentioned above, is referred to as hot-swapping, which is to change, or hot-swap, operating system code while the system is still actively managing the resources for which the new, hot-swapped code is intended. In addition to solving the above-mentioned problems, because of the size and complexity of the state-of-the-art operating systems, greater maintainability is needed. Hot swapping not only solves the above-mentioned issues but addresses the need for maintainability as well.
Although there is a large body of prior work focusing on the downloading and dynamic binding of new components, there has been less work on swapping of transparent scalable components in an active system. For instance, “Dynamic C++ Classes: A Lightweight Mechanism to Update Code in a Running Program,” by Gisli Hjalmtysson and Robert Gray, Annual USENIX Technical Conference, June 1998, pps 65-76, USENIX Association (Hjalmtysson and Gray), describes a mechanism for updating C++ objects in a running program, but, in the disclosed system, client objects need to be able to recover from broken bindings due to an object swap and retry the operation, so the mechanism is not trasparent to client objects. Moreover, this procedure does not detect quiescent state, and old objects continue to service prior calls while the new object begins to service new calls.
Another procedure is disclosed in “Optimistic Incremental Specialization: Streamlining a Commercial Operating System,” by Calton Pu, Tito Autrey, Andrew Black, Charles Consel, Crispin Cowan, Jon Inouye, Lalshmi Kethana, Jonathan Walpole and Ke Zhang, ACM Symposium on Operating System Principles, Copper Mountain Resort, CO, Dec. 3-6, 1995, Operating Systems Review, vol 29, no 5 (Pu, et al.). This reference describe a replugging mechanism for incremental and optimistic specialization, but the reference assumes there can be at most one thread executing in a swappable module at a time. In later work, that constraint is relaxed but does not scale.
In general, the prior art work described herein can be viewed as part of widespread research efforts to make operating systems more adaptive and extensible as in SPIN, Exokernel, and VINO. These systems are unable to swap entire components, but rather just provide hooks for customization. Several people have also done work on adding extensibility to both applications and systems. CORBA, DCE, and RMI are all application architectures that allow components to be modified during program execution, but these architectures do not address the performance or complexity concerns present in an operating system.
There has been work to make operating systems more extensible. Other work has attempted to add hot-swapping capability at the application layer. However, no work provides a generic hot-swapping capability for operating systems that allows them to activate new code without stopping the service (and performing effectively under a varying set of workloads, while ensuring continuous availability).