1. Field of the Invention
The present invention relates generally to scalable network computing services using computing threads. More particularly, the present invention is directed towards lightweight computing threads for scalable Internet services.
2. Description of Background Art
Many Internet web-sites and services use computer thread techniques to handle a large number of simultaneous user sessions. An individual thread is an independent unit of computation that forms a sequential execution stream. In many network applications a thread corresponds to the information required to serve one individual user or a particular service request. The request, in turn, may include one or more function calls that are part of a code segment for performing the request. A thread scheduler is typically used to select a thread to be executed from a ready list of active threads.
Threads may be scheduled based upon a prioritized list or by an un-prioritized round-robin technique.
A thread typically utilizes a hardware register, stack pointers, scheduling information, address space stack space, interrupt vectors, and other resources, depending upon the system. Execution of threads can be switched by saving values of CPU registers and other key data.
A thread uses stack, CPU register, and CPU time resources to execute a process. Threads are classified by their “weight.” The weight of a thread corresponds to the amount of context that must be saved when a thread is removed from a processor and later restored. So-called lightweight threads include functionality for creating, deleting, scheduling, and synchronizing threads in a shared memory environment.
Highly parallel programs sometime use a thread-pool. A thread-pool is a collection of N threads that are used repeatedly to run small tasks. The threads execute in the same memory space, and can therefore work concurrently on shared data. A thread-pool thus serves as threading service in which a request from an interface is handed off to thread in the thread-pool.
Conventional thread techniques offer many advantages for multi-tasking a large number of user-sessions. However, network based services may be required to handle large numbers of concurrent user sessions. For example, a popular web-site may have to have the capacity to handle millions of concurrent user sessions. A firewall or proxy box may have to handle large numbers of concurrent connections. Streaming media and media servers may have to serve a large number of audio and video streams corresponding to a large number of smaller processes.
A cost-effective network service is scalable, i.e., can be adapted to handle large numbers of concurrent user sessions, connections, or processes for a given amount of memory resources. Scalability for a large Internet server may be associated with the server having the capacity to handle millions of sessions, i.e., doubling the server memory resources permits twice as many sessions. However, scalability can also refer to the ability of a device with limited memory resources to handle a large number of sessions relative to the memory resources of the device. For example, some compact network appliances have limited memory resources in order to reduce the size and cost of the network appliance. However, in the context of a scalable network appliance the appliance should handle a large number of user sessions in proportion to the memory resources of the network appliance, i.e., a memory appliance with one-hundredth the memory resources of a large Internet server preferably handles a proportionately reduced number of simultaneous sessions.
Conventional thread techniques are not as scalable as desired. An active thread may exist in several different states. The thread may be running a computational function using the CPU. Alternately, a thread may be waiting for another operation to occur, such as an input/output function. This is sometimes known as a “blocked” thread because the thread is active but waiting for non-CPU resources to complete its task. A blocked thread consumes memory, file descriptor, and other resources but is unavailable to run code for other active sessions.
A system in which the threads are (on average) blocked for a significant fraction of their process time will require more threads to handle a given number of users. There are methods to switch from blocked threads to other threads. However, if context switches are performed, each context switch also consumes system resources. Each context switch has a system cost that is proportional to the size of the root thread context. The system cost of a context switch can be reduced, somewhat, by using a cooperative threading model in which many of the entries in the register set do not need to be saved and restored. Storage of thread context also consumes stack memory resources. Threads are typically allotted a significant amount of virtual stack address space (e.g., 1 MB on the Win32 and 2 MB on Solaris) so that each thread has the stack memory resource required to handle the functional calls in a session request.
FIG. 1 is an illustrative diagram of a prior art thread based approach to map every user session onto a thread. Each user 102 has an outstanding user session 105, e.g., each user may be requesting information or viewing pages from a web-site. Each user session 105 is mapped onto a thread in a thread pool 110 using a thread pooling mechanism (not shown in FIG. 1). Note that not all sessions are active at any given time. A session is active only when it is carrying out a new task on the behalf of a user.
For example, when a web email user clicks the “Check New Mail” button of an e-mail client, the user's session become active while checking for the availability of new e-mails. However, when a user is merely passively reading messages the e-mail session is not active. As a result, only active sessions need to be mapped to a thread. Consequently, as shown in FIG. 1, there can be more users 102 than threads 110.
Thread-based sessions work well for services in which there are comparatively small I/O delays to finish a thread session. As one example, thread-based sessions work well in web servers that serve static web pages from a disk. Each user session 105 for receiving a static web page is a request that causes a thread 110 to read the requested data from the disk and serve the data to the client. A disk read operation for a static web page is completed comparatively quickly so that requests for static web pages can be served in a short amount of time, i.e., the I/O delays are comparatively small.
However, thread-based sessions to not work well for services, such as web-proxies, where the threads are blocked for significant periods of time because of input-output (I/O) operations. A web proxy is used as an intermediary between a client and an upstream server. In a typical situation, a request from the client causes the proxy to issue a request to the upstream server. The proxy then waits for a response to come back from the upstream server before issuing a response to the client. While the web proxy waits for the response to come back from the upstream server it blocks an I/O operation for an undetermined length of time. Extra threads are required to handle a given number of sessions, increasing the stack memory resources required for the threads. Note also that there are substantial system costs if a context switch is performed.
FIG. 3 is a more detailed block diagram of a conventional thread system. A user session 302 is mapped by a thread mapper 304 onto an individual thread 300. Each thread 300 is allocated sufficient stack space 305 to handle any function the thread may execute. Consequently, for a thread pool having a total of J threads, where J is an integer, the total stack space allocated to the threads is the summation of the stack space allocated to each of the J threads.
An alternative conventional approach is the asynchronous event-based session shown in diagram of FIG. 2. For the purposes of illustration, a single thread 210 is shown for executing each user session 105 of a group of simultaneous users 102. The asynchronous event-based session model has the advantage of reducing the time threads spend blocked but at the cost of a more complex programming model in which each session object is a state machine. In the asynchronous model the session objects must encode state transitions to permit a switch from a blocked session to another session. If a thread encounters a potentially blocking operation, such as an I/O operation, it records the current state of the session, suspends the current session, and goes on to pick up another session that needs to be processed. When the blocking operation finishes, the system generates an asynchronous event via asynchronous event generator 220 that marks the session runnable again. Each thread thus only blocks if there are no runnable sessions. This means that a small number of threads (e.g., one thread) can support a larger number of sessions.
The asynchronous model of FIG. 2 works well for systems with stable and well defined protocols with well-defined state transitions, such as Internet firewalls and proxies. However, the programming model can comparatively complex because of the need to preserve state transition information. Moreover, if a new feature is added to the program, new states may have to be added to the state transition diagram. This can greatly increase the work required to upgrade a program, add new features, or modify existing features. The result is that the asynchronous model of FIG. 2 is both more difficult to program and more difficult to upgrade compared with the thread model of FIG. 1.
Conventional thread techniques thus do not have all of the desired features for handling a large number of user sessions. The thread-based model of FIG. 1 is comparatively simple to program, but has problems with scalability, particularly if the threads are blocked because of I/O operations. Additional threads 110 can be added to the thread pool but there are significant system penalties associated with increasing the number of threads 110. This can limit the ability of a network device (e.g., a server or network appliance) to handle a large number of simultaneous sessions. The asynchronous event-based model of FIG. 2 is highly scalable, but requires that the state of each session be recorded for blocked threads. This makes the asynchronous model more difficult to program and upgrade.
Therefore, there is a need for an new thread programming technique that is highly scalable and comparatively simple to program and upgrade.