The execution of a multi-threaded application program in a computer system generally involves a large number of threads, both in the user space and in the kernel space of the operating system. Generally speaking, the user-space threads are serviced by corresponding kernel threads in the kernel space of the operating system.
In the 1×1 thread model, there is a corresponding kernel thread for each existing user-space thread. To facilitate discussion, FIG. 1 shows a 1×1 thread model wherein an application program 102 generates three user-space threads 104, 106, and 108. These user-space threads are serviced by corresponding kernel threads 110, 112, and 114 in the kernel space of the operating system. The one-to-one correspondence between user-space threads and kernel threads are shown by the three lines 120, 122, and 124. The three kernel threads are scheduled for execution by a kernel scheduler 130 on two processors 132 and 134 of FIG. 1.
A complex application program may create hundreds or thousands of user-space threads during its execution lifetime. If a kernel thread is required for each existing user-space thread, thousands of kernel threads may exist concurrently, all of which require a large amount of resources in the kernel. Given the fact that kernel resources are expensive, the 1×1 thread model is therefore not necessarily the most efficient.
FIG. 2 shows a M×N thread model in which a smaller number (N) of kernel threads exist for a larger number (M) of user-space threads. With reference to FIG. 2, three user-space threads 202, 204, and 206 from an application program 208 are serviced by two kernel threads 210 and 212 via a user-space scheduler 214, which may implement any suitable scheduling algorithm (e.g., round-robin, weighted round-robin, modified round-robin, etc.). Another user-space thread 216 is shown having a dedicated kernel thread 218 to illustrate that the M×N thread model may also nave dedicated one-to-one relationships between user-space threads and kernel threads in order to provide backward compatibility and/or to improve performance for certain user-space threads/applications.
Since fewer kernel threads are multiplexed among a larger number of user-space threads in FIG. 2, less kernel resource is required to service the application. Since kernel resources are expensive, the M×N thread model has become increasingly popular.
The M×N thread model however requires a facility in the kernel to invoke a routine in the user space during system call returns. This facility, called “upcall,” is substantially similar to a signal handler in its requirement for the execution of a user-space routine from the kernel. Currently, the invocation of the user-space routine from the kernel is required during a system call return or a trap return from the kernel.
To facilitate discussion, FIGS. 3 and 4 illustrates how a system call may be made and how a signal is handled in the system call return path in a M×N thread model system, such as in a HP-UX™ system by the Hewlett-Packard Company of Palo Alto, Calif. Currently, the mechanism illustrated in FIGS. 3 and 4 represents the primary mechanism available to execute a user space code segment from the kernel (i.e., by launching a signal handler).
FIG. 3 illustrates a simplified method of handling systems calls between the user-space and the kernel space. In the example of FIG. 3, a read system call is employed as an example although other system calls may work analogously. In block 302, the application makes a system call read( ), which begins as a user-space thread. The user-space thread read ( ) then jumps to the user-space stub read_sys ( ) to handle the system call in block 304.
From the user-space stub read_sys ( ), a transfer through a gateway facilitates a privileged transfer to the kernel space, which has a higher privilege than the user space. Specifically, the privileged transfer invokes a system call initialization routine syscallinit (block 306), which saves user space context of the application (i.e. the contents of the register state), into a save-state.
In block 308, the high-level system call sys_call ( ) is invoked in order to, for example, marshal the necessary arguments for the actual kernel system call READ, which is shown in block 310. The kernel system call READ then performs the action required by the application, as specified originally in block 302. In the case of READ, the reading of data may involve waiting until the I/O device (e.g., the hard disk or a data on a network) becomes available. In such a case, the kernel thread is “blocked” and the system call is essentially in a wait state, waiting for unblocking to occur before the system call can be completed. Reference number 312 as shown illustrates the kernel system call block in FIG. 3.
Suppose the I/O device permits reading at some point in time. Once the kernel system call READ acquires the required data, the kernel thread is unblocked and can begin to initiate the system call return process. In block 314, the context data, which is previously saved in block 306, is restored via the syscallinit routine. Once the registers are restored, the user-space privilege is restored to allow the thread to branch back to the user-space stub read_sys ( ) of block 304. From user-space stub read_sys ( ) of block 304, a branch is made to the original user-space system call read ( ) in block 302, which then completes the system call by the application. This is how system calls work on HP-UX™.
Generically speaking, in a M×N thread system, signal handlers are launched from the kernel by saving the current execution context onto the thread's user stack. The signal handler return frame is then pushed onto the user stack to allow the signal handler to return back to the kernel once it finishes execution of the syscall or trap return. A branch to the user-space signal handler is made, followed by execution of the user-space signal handler code. On return from the signal handler, the control frame transfers the fabricated frame, which is basically set up to execute the system call sigcleanup, which transfer control back to kernel.
FIG. 4 is a flowchart illustrating how signal handling is performed in a HPUX™ system that implements M×N threading during the system return call path. In block 402 represents a thread running in user space. At some point this thread needs to make a system call and thus enters the kernel. The thread, on entry into the kernel will save its context data in block 404 (as shown earlier in FIG. 3, block 306). After completing the system call, on the return path In block 406, the kernel thread sees the signal and has to handle it. It first saves the current execution context in the kernel (block 408), including for example the current kernel registers, in order to service signal.
In block 410, the user context data stored earlier in block 404 is copied onto the user stack in the user space for restoration at a later time. In block 412, the user stack in the user space is modified to allow a system call sigcleanup frame to be pushed onto the user stack. This frame is setup to allow the clean up to take place after the signal handler finishes. Subsequently, block 414 branches to the user space. In block 416, a signal handler is launched in the user space to execute the signal handler user space code. In block 418, the sigcleanup frame, which was pushed onto the user stack earlier in block 412, is popped to enable a privileged transfer back into the kernel to restore previously saved context data in the kernel. Thus in block 420, in the kernel space, the user context data is saved. This context data saving step is analogous to that performed in block 404 earlier.
In block 422, the execution context stored earlier in block 408 is restored. Block 422, takes the thread back to user space from where it entered the kernel. The operation described above, although for a system call, is applicable to the trap path as well.
While the approach above works fine for signal handlers, significant performance degradation is experienced if the signal handler approach is employed to handle “upcalls.” To illustrate, FIG. 5 shows the steps implementing an upcall as a signal handler in a M×N thread system. In block 502, the application makes a system call that requires the use of a kernel thread. The user-space context data is saved once upon entering the kernel, as described before in block 404 of FIG. 4. In block 504, the kernel thread is blocked. With respect to the READ system call example, the kernel thread may be blocked while waiting for the I/O device to become available, for example. After the kernel thread is unblocked, the unblock handler launch setup occurs in block 506. In this case, the unblock handler is set up as a signal handler.
In block 506, the current execution context is saved, and the user-space context data stored earlier upon entering the kernel space is copied onto the user stack in the user space. The steps associated with block 506 is analogous to those described in blocks 408, 410, and 412 of FIG. 4. After block 506, a branch into the user space occurs.
In block 508, the unblock handler is launched, which results in for example the notification to the user space scheduler that unblocking of the kernel thread has occurred. The launching and execution of the unblock handler is analogous to the steps described in connection with blocks 416 and 418 of FIG. 4. After block 508, a branch back into the kernel space occurs (block 510). Also in block 510, the user context data is saved upon entering the kernel space.
In block 512, the execution context stored earlier in block 506 is restored, enabling a return to the user space (block 516).
As can be seen in FIG. 5, implementing an upcall as a signal handler requires, in addition to the saving of the user-space context upon entering the kernel space (e.g., between blocks 502 and 504 of FIG. 5), two additional save operations and a copy operation as overhead. The first save operation involves saving the current execution context in block 506. The second save operation involves saving the user-space context upon reentering the kernel space (block 510). The copy operation involves copying the user-space context data into the user stack (block 506). These overhead operations are handled using kernel resources, which tend to be the limiting factor in most systems since kernel resources are comparatively expensive. Additionally, an application may involve hundreds or thousands of system calls in its execution lifetime. With the extra overhead on the kernel resources impacting each system call blocking and unblocking, the arrangement of FIG. 5 results in a significant degradation in system performance.