The present invention relates to the implementation of signals in a multithreaded environment and more particularly to the proper delivery of asynchronous signals in a system supporting both user-level and kernel-level threads.
A thread is a sequence of control within a process and has little context information associated with it. Consequently, a thread has minimal storage requirements. Threads may be grouped into two classes, user threads, which are visible only at the user level, and kernel threads, which are visible only at the kernel-level. A hybrid of these classes is the lightweight process (LWP), a kernel-visible entity which is also visible at the user level. Finally, there is the process, which is also visible to both the user and the kernel, and carries with it the full complement of context information. A process is generally capable of standalone execution.
A traditional, single-threaded process follows a single sequence of control during program execution. However, as multiprocessing (MP) architectures have become more prevalent, the use of multiple threads has become an important programming paradigm. A multithreaded (MT) process has several sequences of program control, and is thus capable of taking several independent actions. In an MP system, these actions can occur simultaneously, with each thread running on a separate processor. This parallelism increases the performance of MT applications. However, MT also provides performance benefits on uniprocessor systems by improving the overlap of operations such as computation and input/output transactions. MT also offers several other benefits, such as more uniform computational loads, higher throughput, reduced overhead and simplified program structure.
While there are numerous ways to implement MT, the techniques fall primarily into three general categories. These categories are based on the number of user-level threads associated with a given number of kernel-level threads: 1) Many-to-one; 2) One-to-one; and 3) Many-to-many.
Implementations of the many-to-one model allow an application to create a large number of threads that can execute concurrently. This model is also referred to as a "user-level threads" model. In a many-to-one implementation, all threads' activity is restricted to the user-space and is supported entirely by a user threads library, which takes care of all thread management issues. The user threads library limits a process to executing only one thread at any given time, although there is a choice as to which thread is executed. As a result, this model provides only limited parallelism and does not fully exploit the advantages offered by MP architectures.
In the one-to-one model, each user-level thread has access to the kernel. A primary constraint here is that the developer must use caution due to the threads' kernel visibility. Another constraint is that the developer must be frugal in the use of these threads, due to the storage requirements of such threads. As a consequence, many implementations of the one-to-one model limit the number of threads which may be created. This type of thread is similar to the LWP mentioned above.
The many-to-many model avoids several of these limitations. This model, also referred to as the "two-level" model, minimizes programming effort. The computational "cost" and "weight" of each thread are also reduced. The many-to-many model can support as many threads as the many-to-one model, while providing kernel access similar to that in the one-to-one model. In addition to supporting LWPs, this model also provides a user-level threads library. The LWPs are visible to both the user and the kernel, as in the one-to-one model. The user threads are only visible to the user, as in the many-to-one model. User threads may then be associated with LWPs upon execution. Thus, the kernel need only manage the currently active LWPs. A many-to-many implementation simplifies programming at the user level, while effectively removing restrictions on the number of threads an application can use.
An example of a two-level MT architecture is the Solarisw.TM. OS from SunSoft.TM. (Sun.TM., Sun Microsystems.TM., Solaris and SunSoft are trademarks or registered trademarks in the United States and other countries). The Solaris two-level architecture separates the programming interface from the implementation by providing LWPs. The kernel dispatches each LWP separately, so each LWP is capable of performing independent system calls, incurring independent page faults, and running in parallel on multiple processors. LWPs allow the use of threads without the need to consider their interaction with the kernel. At the user level, the threads library includes a scheduler that is separate from the kernel's scheduler. The threads library thus manages multiplexing and scheduling runnable threads onto the LWPs. User threads are supported at the kernel level by the LWPs.
Solaris provides the application programmer with the option of binding a user-level thread to an LWP, or leaving the thread unbound. Binding a user-level thread to an LWP establishes a connection between the two. Thread binding is useful to applications that need to maintain strict control over their own concurrency (i.e., parallel execution), such as those applications requiring real-time response. Unbound user-level threads defer control of their concurrency to the threads library, which automatically grows and shrinks the pool of LWPs to meet the demands of the application's unbound threads. An LWP may also be shared by several user-level threads.
A primary feature of the above implementation is that the user threads are not visible to the kernel, implying that a thread's attributes are also invisible to the kernel. This off-loads the administrative tasks associated with the user threads from the kernel to the user threads library and user process. However, attributes such as a thread's signal mask are also invisible to the kernel. A signal is a mechanism within a kernel for notifying an executable entity of system events. These events may take the form of inter-process communications, system responses to service requests, error conditions and other such occurrences. A mask, in this context, refers to a mechanism by which an executable entity such as a thread can temporarily ignore one or more signals. A signal can usually be masked, and a masked signal may be held by the kernel as pending. This allows a program to have certain signals temporarily held as pending during the execution of critical code sections to prevent interruption by those signals.
The challenge in implementing this scheme lies in eliciting correct (i.e., expected) program behavior in areas such as signal delivery and scheduling, while retaining the performance and resource conservation goals achieved by minimizing kernel involvement. This is particularly important in the area of delivering asynchronous signals.
Some of the problems which may occur in this regard are best explained by example. In the example, a process (P1) has two threads (T1 and T2) and one LWP (L1). T1 is scheduled on L1 and is blocked in a system call (e.g., waiting for the system to service a request). T1 has the signal SIG masked and so will not respond to SIG. T2 is pending on a user-level queue, awaiting notification by a process-local synchronization variable (e.g., a condition variable). T2 does not have signal SIG masked. P1 has a signal handler (e.g., interrupt routine) installed for signal SIG which is responsible for getting T1 out of its blocking system call. P1 is waiting for SIG to be delivered from another process. When SIG is sent to P1, P1 expects T2 to wake up, and run the signal handler, thus activating T1.
The kernel needs to deliver SIG without explicit knowledge of T1's or T2's signal masks, while ensuring that masking semantics, information delivery and signal queuing are correctly maintained. In addition, signal masking should remain an essentially user-level operation, thus retaining the performance advantages previously mentioned.
One possible manifestation of this problem is encountered by systems which cause T1's mask to be inherited by L1 ("pushed down" to L1). Because L1 has SIG masked in this scenario, the signal handler will never be executed and T1 will never be awakened, resulting in a deadlock condition.
Another possible manifestation of this problem is the kernel inadvertently interrupting T1's system call when delivering SIG to L1. Although the global handler would make sure that the signal was re-directed to T2 and that T1 would not actually run the signal handler, the side-effect of interrupting T1 would be unexpected and unavoidable. This scenario violates at least one aspect of the signal model, that being the signal masking semantic. This semantic requires that, if a thread masks a signal and is blocked in a system call, the system call not be interrupted by the masked signal.
Yet another possible manifestation in the above implementation is a pathological scenario where all application LWPs and their associated threads mask a specific signal. However, a thread might be sleeping on the user-level sleep queue with this signal unmasked. The signal may never be delivered to this thread because there is no thread available which will accept the signal. Signal non-delivery thus results.
The effects of actual signal delivery should also be considered. Actual signal delivery should only be performed once. Moreover, the actual delivery of the signal should only be to the thread that has the signal unmasked. Otherwise, the semantics which commonly govern the generation and delivery of signals may be breached.
Accordingly, it is desirable and important to provide for the proper delivery of asynchronous signals in a two-level, multithreaded environment. Furthermore, the delivery of these signals should not cause unexpected side-effects or breach signalling semantics. Finally, dead-lock and non-delivery of signals should be avoided.