Intercepting system calls is sometimes needed in order to perform certain special processing in addition to normal system call processing. One such example is a user-level checkpoint/restart mechanism, in which it is necessary to keep track of files that are in use by a program. Checkpointing facilitates recovery of a process by recording intermediate states of the process at particular intervals. This enables the process to be restarted from the last checkpoint, rather than from the beginning of the process.
Most modern operating systems employ a "dynamic link library", which refers to the ability of a system to load the library used by a program while the program is being loaded or even running. One use of a dynamic link library is the ability to create and use shared object modules. Shared object modules enable the system to use the same object module (i.e., dynamic library) for all of the programs running on the system that use that shared object module. Since the dynamic library is loaded for the program at run time, the system need not load an additional copy of the library if the library is already loaded for another application program. Unfortunately, in many systems, it is not possible today to perform user-level checkpointing and restarting of a program that uses dynamically linked libraries.
Further, when implementing checkpoint/restart from user space, it is necessary to track files that are used by the program. This is typically done by intercepting the basic system calls that manipulate files, such as open( ), close( ), dup( ), etc. There will thus, for example, be an open( ) function call in the checkpoint library that gets called whenever the program calls open( ), either directly or indirectly, such as from inside an fopen( ) call. This function must in turn call the actual open( ) system call. However, one cannot simply call open( ) since that will in turn result in calling the intercepting open( ) library function rather than the system call. This is because the general rule employed by a linker is that if there is a locally defined function, that function takes precedence over a system call function where the functions have the same name. Thus, one problem to be addressed is how to call the actual system call after the intercepting has occurred.
The Condor Distributed Processing System describes one embodiment for checkpoint and migration of UNIX and AIX processes. This system is described in various publications, including for example, Tannenbaum et al. entitled "The Condor Distributed Processing System", Dr. Dobb's Journal, February 1995; and Litzkow et al. entitled "Checkpoint and Migration of UNIX Processes in the Condor Distributed Processing System", University of Wisconsin--Madison Computer Science Technical Report #1346, April 1997, available at http://www.cs.wisc.edu/condor/doc/ckpt97.ps. A similar mechanism is employed in International Business Machines Corporation's "LoadLeveler" product, which is described in IBM's manual entitled "LoadLeveler: Using and Administering", Version 1.3, Publication No. SC23-3989, August, 1996.
The Condor system for UNIX recommends using a syscall( ) mechanism available on certain UNIX systems to call a system call without using the actual name of the system call. However, in AIX implementations, there is no such syscall( ) mechanism. One way around this is to employ system call wrappers (to the system kernel) in a separate library from the rest of the intercept library. The system call wrapper library is then dynamically linked separately, and has (for example) its open( ) call bound with the real open( ) system call. By way of example, the library may be dynamically linked with a checkpointable program. All the other libraries are statically linked with the program, and hence all other open( ) calls are bound with the checkpoint open( ) wrapper. Since the wrapper library is dynamically linked with the program, the wrapper library must be present in the proper library location on each machine when the checkpointable program is run. If the dynamic system call wrapper library were not present, the loader would be unable to load and execute the program.
In view of the above, it would be desirable to statically link the entire program, thus avoiding the necessity of ensuring that the dynamic library is in place on the machine on which the program is run (or restored/migrated). The present invention is directed to meeting this need by statically linking an application process with a wrapper library while still allowing the desired intercept function.