This invention relates to the field of distributed transactional processing. In particular, this invention is drawn to improving database availability in a multi-tier transaction processing system.
Internet commerce applications are typically distributed applications with at least a three tier architecture. One tier handles presentation (e.g., the user interface), another tier handles the logic of the application, and yet another tier handles the data. Clients (e.g., browser software) handle the presentation. Application servers such as web servers handle the core logic of the application, but are otherwise stateless. A database contains the data and the state of the application. Application servers processes a client submitted request and store a resulting state in the database. The application server then returns a result to the client. This architecture has become prevalent with the advent of commerce on the Internet.
The database tier tends to be the greatest availability bottleneck in the three tier architecture. One approach for increasing the availability of the database tier is through the use of parallel database systems. Parallel database systems have multiple database processes accessing the same data. If one database process fails, the client can be switched to another process. Physical implementation is possible using either a shared nothing approach or a share everything approach for data storage resources.
Each of these solutions relies on time consuming transaction log recovery/reconciliation before another database process can take over for the failed process. In addition, these approaches tend to require special hardware/software solutions to enable processes executing on different machines to access any shared resources.
Various methods for implementing replicated database systems include active replication and primary/backup replication. Active replication delivers update requests in the same order to all the databases. One disadvantage of this approach is that the states of the individual databases may diverge due to the non-deterministic nature of transaction processing. A primary/backup approach processes transactions and checks its state to one or more backups. This approach may resolve non-determinism issues for parallel processes because only the primary actually executes the transactions.
One disadvantage of traditional primary-backup methods is that they assume a unique primary at all times. This is difficult to ensure in practice, gives rise to conservative timeout values for detecting the crash of a primary, and cannot handle network partitions.
Methods and apparatus for enabling fast failover in a database tier of a three tier asynchronous network comprising at least one database client and a plurality of database servers are provided.
One method of processing transactions includes the step of concurrently processing a plurality of transactions including a selected transaction on a current primary of the plurality of database servers. Each transaction is broadcast to the remaining database servers for serial execution in an order determined by the order in which the current primary encountered their respective commit requests. The transaction is executed on all remaining database servers as long as the identity of the current primary has not changed. The transaction is committed on every database server if the identity of the current primary has not changed throughout execution. The committed result is returned form at least one database server to the database client.
Another method includes the step of receiving a transaction from the database client by a first database server identified as a primary for execution. Every transaction is initiated on the database server identified as the primary during an initial epoch. The transaction is processed on the first database server while the first database server is identified as the current primary. The transaction is broadcast to all database servers in response to a commit request of the transaction. The transaction is committed on the first database server if the first database server has been the primary exclusively throughout the transaction execution. The broadcast transaction is executed by any receiving database server that is not the primary. The broadcast transaction is committed on the receiving database server if the epoch of the received transaction is the same as a current epoch that identifies the current primary.
Another method includes the step of concurrently executing a plurality of transactions including a selected transaction on a first database server during an epoch exclusively identifying the first database server as a primary. The selected transaction is committed on the first database server in response to a commit request of the selected transaction if the commit request is encountered before the epoch is changed. The selected transaction is broadcast to a second database server upon receipt of the commit request. The selected transaction is non-concurrently executed on the second database server as long as the epoch is not changed. The selected transaction is committed on the second database server if execution is completed and the epoch has not changed.
Other features and advantages of the present invention will be apparent from the accompanying drawings and from the detailed description that follows below.