1. Technical Field
This invention generally relates to computer systems, and more specifically relates to application servers and how they respond to the failure of a backend.
2. Background Art
Many computer systems today are highly networked to provide a variety of different computing services. For example, client computer systems may connect to server computer systems to request certain services from the server computer systems. When a user uses a web browser to request a web page, the user's computer system typically opens a connection to a web server that hosts the web page, and the web server then delivers the requested web page to the user's web browser. One specific type of server that is known in the art is called an application server because it services requests from software applications. Application servers are often connected to various different backend systems, such as a database, a different application server, a messaging service, etc. The term “backend” as used herein means any computer system that may be coupled to an application server.
An application server often uses a “pool” of connections to connect to a backend. Maintaining a connection pool allows a connection to be re-used without the cost of creating and verifying a connection each time a connection is needed. A connection pool typically specifies a number of allowable connections to the backend. When a thread in the application server needs to make a request to the backend, it first obtains a connection from the connection pool, then makes the request using the connection. If the maximum number of allowable connections is already being used, the next request must wait until one of the connections in the pool finishes its current task and becomes available.
A problem associated with connection pooling occurs in the context of a backend failure. Many backends have no architected way to indicate they have failed. As a result, when a backend fails, connections in the connection pool will simply hang because the backend has failed and is not able to service the requests. Many backends are being designed to quickly recover (or failover) in the event of a failure. However, even the quick recovery of a failed backend does not help the problem of the stalled connections in the connection pool. Application servers typically use a TCP timeout mechanism to time out a connection when one end of the connection becomes unresponsive. However, the TCP timeout value is typically a global value for a computer system, which does not allow customizing the TCP timeout value to different applications. Ofttimes, the TCP timeout value is specified in minutes, and may be significantly longer than the time required for the backend to failover.
A simple example will illustrate the problem in the prior art. Let's assume an application server has a connection pool for a DB2 database backend that is designed to failover in ten seconds. Let's also assume that the TCP timeout value is set to five minutes, and we have a connection pool with six allowable connections. We further assume that all six allowable connections in the connection pool are being used by the application server to access the DB2 database. We now assume the DB2 database fails. The six pending connections in the connection pool will hang because the DB2 database has failed. Even though it takes only ten seconds for the DB2 database to failover, the six connections will remain hung until five minutes of inactivity on each connection. The result is two-fold: 1) the DB2 database cannot be accessed for nearly five minutes after it failed over, even though it failed over in ten seconds; and 2) threads that are waiting on the hung connections will be unable to do any work for the five minute period. In addition, if the maximum number of threads on the application server are all servicing hung connections, there will be no way for the application server to do any work, even work that does not access the database, until the connections time out and the threads bound to the connections are freed up. The result is a backend that is unavailable for nearly five minutes while waiting for the TCP timeout period for each connection to expire, even though the backend recovers from a failure in ten seconds. Without a way to more efficiently deal with backend failure in an application server, the computer industry will continue to suffer from poor performance in an application server when a backend fails.