1. Field of the Invention
The present invention is related generally to a data processing system and in particular to a method and apparatus for checkpoint operations. More particularly, the present invention is directed to a computer implemented method, apparatus, and computer usable program code for checkpointing and restarting modules in a workload partitioned environment.
2. Description of the Related Art
A workload partition is a virtualized operating system environment within a single instance of the operating system. A workload partition may also be referred to as a software partition. A single instance of the operating system can be partitioned into multiple virtual operating system environments. Each of these virtual operating system environments is known as a workload partition. An example of a workload partition is AIX® workload partition (WPAR), which is a product available from International Business Machines (IBM®) Corporation.
Software running within each workload partition will appear to have its own separate instance of the operating system. A workload partition may include one or more processes. Processes in a workload partition are completely isolated from processes in other workload partitions in the same system. They are not allowed to interact with processes in other workload partitions in the same system.
A workload partition, including any applications or other processes running in the partition, may be migrated from one physical computing device to another physical computing device, while still active. In other words, migration of software partitions allows a user to move a set of active applications from one computing device to a different computing device. In this manner, a user can target a selected set of applications to move to the different computing device without transferring all applications running on the computing device.
Migration of a software partition involves checkpointing the state of every application process in the workload partition that is to be moved from one computing device to form checkpoint data. Then the state of every targeted application process in the migrated workload partition may be restored on the different computing device using the checkpoint data.
A checkpoint operation is a data integrity operation in which the application state for an application process running on the kernel are written to stable storage at particular time points to provide a basis upon which to recreate the state of an application in the event of a failure and/or migration of the application to another data processing system.
During a checkpoint operation, an application's state and data may be saved onto a local disk or a network disk at various pre-defined points in time to generate checkpoint data. When a failure occurs in the data processing system and/or when the application is migrated to a different data processing system, a restart operation may be performed using the checkpoint data to restore the state of the application to the last checkpoint. In other words, the application data may be restored from the checkpoint values stored on the disk.