In a multi-threaded program that employs fork-safe application program interfaces, process replication tends to be relatively slow. The process of fork-safing libraries is complex, leading to deadlocks owing to mutex ordering during process replication. As will be familiar to those in the art, a mutex (or ‘mutual exclusion lock’) prevents multiple threads from simultaneously executing portions of code that access shared data, and so serialize the execution of threads.
In one existing approach, to fork-safe a multi-threaded application, an application developer uses the atfork( ) handlers provided in the particular thread implementation (pthread_atfork( ) in the case of POSIX threads). The application developer must ensure that only fork-safe APIs are called between fork and exec in the child process; the developer must also ensure that either the fork-safe libraries do not have any dependencies or—if they do—that the atfork handlers are installed in the correct order.
The pthread-atfork( ) function allows an application to install fork handlers by registering handler functions to be called just before and just after a new process is created with a fork( ) operation. These fork handlers are thus called before and after a fork( ) operation, and in the context of the thread calling fork( ). As with atexit( ) handlers, the application does not need to do anything special for these fork handlers to be called; they are invoked by the system when a fork( ) operation occurs. Thus, pthread_atfork( ) is defined as follows:
int pthread_atfork(  void (*prepare) (void),  void (*parent) (void),  void (*child) (void),);
The ‘prepare’ handler is called before performing the fork( ), the ‘parent’ handler is called in the parent process after performing the fork( ) and the ‘child’ handler is called in the child process after performing the fork( ).
All multi-threaded applications that call fork( ) in a POSIX threads program, and do more than simply call exec(2) in the child of the fork, must ensure that the child is protected from deadlock. Since some thread-replication implementations duplicate only the thread that called fork( ), it is possible that—at the time of the call—another thread in the parent will own a lock. This thread is not duplicated in the child, so no thread will unlock this lock in the child. Deadlock occurs if the single thread in the child needs this lock.
This scenario is illustrated schematically at 100 in FIG. 1. Referring to FIG. 1, mutex m1 is locked by thread T2 in the parent process at time t1. The fork( ) is being done by thread T1 in the parent process at time t2. Since only the forking thread will be replicated, the child process has only one thread, T1. Thread T2 releases mutex m1 at time t3, but this is not reflected in the child process. Hence, at time t4, when thread T1 in the child process tries to lock m1, a deadlock results.
The problem is more serious with locks in libraries. A library writer does not know if the application using the library calls fork( ), so the library must protect itself from such a deadlock scenario. If the application that links with this library calls fork( ) and does not call exec( ) in the child, and if it needs a library lock that may be held by some other thread in the parent that is inside the library at the time of the fork, the application deadlocks inside the library. Such a scenario is depicted schematically in FIG. 2. Thus, referring to FIG. 2, by the time t2, when a fork( ) is called in thread T1, the lock m1 inside the library is in a locked state, since it was locked by thread T2 at the time t1. Even though T2 unlocks m1 at time t3, it is not reflected in the child thread T1. A further call to library_API_call( ) in child thread T1[[.]] leads to a deadlock.
In order to make a library safe with respect to fork( ) by using pthread_atfork( ), one must:
(1) identify all locks used by the library (e.g. {L1, L2, . . . , Ln}) and their locking order; and
(2) add a call to pthread_atfork(f1, f2, f3) in the library's .init section, where f1, f2, f3 are defined as follows:
f1( ){  /* ordered in lock order */  pthread_mutex_lock(L1);  pthread_mutex_lock(. . .);  pthread_mutex_lock(Ln);f2( ){  pthread_mutex_unlock(L1);  pthread_mutex_unlock(. . .);  pthread_mutex_unlock(Ln);f3( ){  pthread_mutex_unlock(L1);  pthread_mutex_unlock(. . .);  pthread_mutex_unlock(Ln);}
With this approach, the forking thread will await block on these mutexes, will lock it when available, and then will release these locks in both parent and child. In the child, attempting to lock these mutexes will not lead to a deadlock as these locks are available in an unlocked state. This approach is depicted schematically in FIG. 3. Referring to FIG. 3, mutex m1 is locked in thread T2 at time t1. Fork is called at t2 in thread T1. Since this particular mutex has an associated mutex lock in the atfork handler, the fork cannot proceed further. It waits on mutex m1, which is unlocked by thread T2 at time t3. Subsequently thread T1 gets lock m1 at time t4 and continues with the fork. It does the unlocking of mutex m1 in both child and parent at time t5 by means of atfork unlock. After that, a mutex_lock(m1) is called by T1 in the child process at time t6, but this does not lead to a deadlock as m1 is in an unlocked state by that time.
This approach solves the problem but has the following disadvantages:
(1) A multi-threaded application takes a considerable time to effect process replication if the application uses fork-safe libraries that have too many locks that must be fork-safed, because it has to acquire and release the required locks. The forking thread will sequentially block on all the unavailable mutexes and the fork will be delayed till all the mutexes are available.
(2) It is difficult to fork-safe a library if it has dependency on some other library; a library developer must collaborate with other library developers to determine if it is possible to install atfork( ) handlers to avoid a deadlock in customer applications.
This requires co-ordinating which atfork( ) handlers are installed first, based on the order locks are normally acquired in the code path. The order of installation of atfork( ) handlers is based on link order, and the application must ensure that the link order is correct if there is dependency between libraries that are linked.
Care must also be exercised in determining the installation order of these handlers. In particular, inter-library dependencies may cause deadlock if the atfork handlers were not executed (installed) in the correct order. The following example illustrates this effect.
(1) Thread A calls function yyy( ) in libxxx which calls libc's malloc( ) routine. The function yyy( ) acquires the yyy_lock prior to calling malloc( ).
(2) Meanwhile, thread B does a fork( ) before Thread A's call to yyy( )→malloc( ) acquires malloc_lock.
(3) If the libc “pre-fork” handler is executed before the libxxx “pre-fork” handler, then a deadlock has occurred. Libc's “pre-fork” handler will acquire malloc_lock without contention, as thread A has not yet acquired it. The libxxx “pre-fork” handler waits for the yyy_lock which cannot be released until the malloc( ) has completed. However, the malloc( ) cannot complete because its lock has already been acquired by libx's “pre-fork” handler.
Thus, Thread A has acquired yyy_lock and is blocked on malloc_lock. Thread B has acquired malloc_lock and is blocked on yyy_lock. The application is deadlocked. It will be noted that a deadlock could still occur with this incorrect execution order of the atfork handlers even if the first thread had been able to acquire the malloc_lock before the second thread forked.
It will also be noted that the POSIX threads standard states that it is only safe to call async-signal safe functions in the child of a multi-threaded process before the child calls exec. If it is desired to call any other library function during this window, it must be fork-safed as described above. This becomes complicated if there are many fork-safe libraries with interdependencies.