Conventional server applications follow a paradigm of request, process, and then respond. In a multi-processor environment, server applications attempt to create enough worker threads to keep all processors executing application code at all times. An example of a typical server application is a database query. After the client makes its query against the database, the server loads and scans index pages, loads and scans data pages, builds up a result set, and so forth. Server applications typically process a client request from start to finish so the server tends to reach points where contentions for a global resource or an input/output operation block further processing. In other words, “thrashing” of the global state (data structures, cache memory, etc.) occurs at the expense of the local state (the request). The processor caches become overwhelmed by constantly fetching new code and/or data from either RAM or disk. Moreover, context switching occurs, which causes programming threads to interfere with one another as the data needed by the new thread overwrites the data being used by a previous thread.
As another example, consider a server application tracking the number of string and character occurrences that it has been given. In this example, the application has two primary functions, namely, ADD and DUMP. The ADD function accepts an arbitrary string and performs a reference count on the string and the characters making up the string. The DUMP function returns an extensible markup language (“XML”) file containing all of the strings and characters and their reference counts.
According to the prior art, the server application in this instance includes the steps of parsing the inbound request; deciding on the required action; performing the ADD; and performing the DUMP. The ADD function includes performing a lookup of the string in a hash table and, if it is not found, creating a record, preparing the record, and then inserting the record in the table. The ADD function then increments the reference count and iterates across the characters in the string, incrementing the reference counts in a table (e.g., a 255-byte double word array indexed by character). The DUMP function iterates across the hash table to generate the string XML and iterates across the letters table to generate the character XML.
In this example, processing of the string table, the character table, and the system heap for the hash records and outbound XML may cause contentions. For instance, if the hash table is not locked before lookups are performed, one thread may attempt to perform an insertion while another is performing a lookup. A conventional server application such as this one spends an undesirable amount of time serializing access to the shared data structures and context switching among all of the request threads. Moreover, writes on different processors continually invalidate cache lines and running under well known Web server software causes thrashing of the instruction cache.
These problems are particularly apparent with enterprise-class server applications involving multiple processors. Those skilled in the art recognize that enterprise-class server applications tend to dominate the machines on which they run and, thus, function like single-function appliances. This is true for database servers, web servers, mail servers, search engines, ad servers, and the like.
For these reasons, a framework for server applications is desired for increasing the number of simultaneous requests that can be handled, maximizing throughput while minimizing latency thus reducing contentions and improving cache coherency. Such a framework is further desired for optimizing the global state of the machine at the expense of the local state of the request.