1. Field of the Invention
The present invention relates generally to an improved data processing system, and in particular to a computer implemented method, data processing system, and computer program product for enabling the restoration of in-flight file descriptors during a checkpoint operation.
2. Description of the Related Art
Most data processing systems use data integrity operations for ensuring that the state of data in memory may be recreated in the event of a failure. A checkpoint operation is a data integrity operation in which the application state and memory contents for an application are written to stable storage at particular time points, i.e., checkpoints, in order to provide a basis upon which to recreate the state of an application in the event of a failure. For example, during a typical checkpoint operation, an application's state and data are saved onto a network disk at various predefined points in time. When a failure occurs, a restart operation may be performed to roll back the state of the application to the last checkpoint, such that the application data may be restored from the values stored on the network disk.
When a checkpoint operation has completed, sockets may be restored along with the data on these sockets. A UNIX domain socket is a socket used between processes on the same UNIX system. For UNIX domain sockets, it is possible that the data on these sockets contain in-flight file descriptors which are restored when the application is ready to read them. A file descriptor is a value used by a process to identify an open file. A file descriptor is the same mechanism that may process a read and write over a network because the file descriptor may point to a file on a disk, or point to a socket that is open in the kernel. A file descriptor table maintained by the kernel for each process translates the file descriptor to the open file or socket. A file descriptor entry in the file descriptor table includes a file pointer which references the location or address of the open file or socket.
A problem with current checkpointing and restore methods is that they do not handle or allow for restoration of these open files that are in-flight.