The present invention relates generally to Web servers, and particularly to solving resource leaks in Web servers.
The recent rapid increase in the demand for network services has caused a similar increase in the number and complexity of Web servers produced to serve that demand. Due to this complexity Web servers are exhaustively tested in development before being taken into production, where they serve customers in real time. But once in production, Web servers can be difficult to test, and the problems revealed can be difficult to solve. One common problem is the resource leak, where some resource, such as memory, is gradually consumed in a manner that is detrimental to the stability or performance of the Web server. One common resource leak is the memory leak.
Web servers provide information components, such as Web pages, word processing documents, spreadsheets, images, movies, and the like, to customers. Some of these information components are static, and therefore can be provided without further processing. Other information components are dynamic, and must be generated by an information component generation process before delivery to a customer. A resource leak is caused when the resource is marked as “in use” while the information component generation process is working, but is not released when the process no longer needs that resource. The resource may be memory, synchronization objects, communication ports, or other finite computer resources. Because the resource is never released, it is not available for use by the same or other processes. Over time, such resource leaks can cause the process or whole operating system to malfunction.
Resource leaks in production Web servers are typically very time consuming and difficult to diagnose and correct.
First, resource leaks often happen very slowly, so days or weeks are spent collecting data to evaluate the effectiveness of each proposed solution. If many solutions must be tried, this process can consume months or more.
Second, resource leaks often happen on production Web servers. Because these Web servers must be highly reliable, personnel diagnosing the resource leak may not be allowed to significantly modify the system. One common way of diagnosing a resource leak is to disable a part of the leaking process. Parts are individually disabled and enabled. When the resource leak is seen to disappear, the parts that are disabled are usually responsible for the leak. This strategy is generally disallowed on production Web servers, since the loss of functionality associated with disabling components is unacceptable. Other diagnostic software components are often used to detect and diagnose resource leaks. Tools such as leak detectors can be very valuable when finding small leaks, but the use of these tools on production servers is often forbidden. Administrators responsible for the stability of production Web servers do not want to put that stability in jeopardy by running additional diagnostic and debugging software on the Web server.
Third, resource leaks are often associated with unusual or unexpected circumstances. Most software systems used on production Web servers have undergone significant testing before being deployed. Therefore, leaks associated with commonly used features of the Web server are unusual. Those leaks are typically found and corrected before deploying the software. When a leak is found on a production Web server, it is frequently associated with a feature that was not heavily tested, perhaps because it didn't seem important, or because the particular sequence of user actions was not predicted. Identifying a resource leak on a production Web server is generally pretty easy. The resource is observed to be exhausted. Identifying the cause of the resource leak is much harder, since it's typically NOT associated with any common functionality. Typically, when a flaw or bug is found in software, the circumstances that lead to the display of the flaw are reproduced in a laboratory. Once a fix has been proposed, that fix is implemented and tested in the laboratory. This model doesn't work well for leaks detected on a production system, since reproducing the circumstances of the production system in a laboratory is difficult. The usage characteristics of a production system are complicated and difficult to characterize.