This invention relates to computer systems and particularly to methods for providing high performance multi-tasking on servers. More particularly, the invention relates to an efficient way to handle multiple work units in servers while providing high throughput and low response times. Typical high performance servers need to process many jobs in parallel with the least possible overhead. On multi-processor machines, multi-tasking is obviously essential for better throughput but even on a single processor, it ensures that the response times for short tasks is not affected by the longer ones.
There are several approaches to achieving multi-tasking in prevailing systems. Using multiple processes is the easiest approach provided by most operating systems. The upsides are ease of programming and maintenance. The downside is that this approach tends to suffer from increasing cache degradation as the number of processes increases. Further, context switching by the kernel is expensive and less efficient with an increasing number of processes since scheduling is pre-emptive. If one maintains a low number of processes, then blocking events reduce the utilization.
Using multiple threads is another common approach with support from both kernel and user space. Kernel threads suffer from similar drawbacks as the process approach. User-space threads have better performance especially for pre-forked threads but synchronization is an issue due to the preemptive nature of the scheduler. Also, when multiple threads accept connections on the same socket, they are wait-listed on the same queue. When a new connection is made, all threads on the socket's wait queue are awakened. All but one of the threads, however, will put themselves back on the queue to wait for the next connection. This unnecessary awakening, commonly referred to as a thundering herd problem, creates scalability problems for server applications.
Co-routines provide another solution similar to user threads with non-preemption. However co-routine creation is expensive especially when only portable mechanisms are used.
Further, in each of the above approaches, there is a fixed cost due to switching between call stacks, which limits the performance even in cases when tasks can be run to completion.
Many of the aforementioned issues can be resolved by using a single threaded state-machine model but there is considerable programming complexity required in this approach due to maintaining state information for each request. Also it might not always be viable especially in the case when third party libraries are used. It is thus apparent that there is a need in the art for a portable, lower overhead solution that provides high-performance parallelism in event driven asynchronous servers.