In a concurrently filed application having the title ARCHITECTURE FOR SCALABLE ON-LINE SERVICES NETWORK, there is disclosed a client-server architecture in which service applications are distributed and replicated across groups (referred to as "service groups") of application servers. Within a service group, each application server independently runs the service application that implements the server portion of the corresponding on-line service. When an end user opens the on-line service, the user is assigned to one of the application servers within the service group, and that application server processes service requests initiated by the user until the on-line service is closed by the user. Advantageously, the architecture is capable of handling tens of thousands of simultaneous user connections. Further, the architecture permits application servers to be efficiently reallocated to different service groups to accommodate for changes in the usage levels of different on-line services.
Certain types of on-line services on the network provide user access to service content data that is updated by the on-line service on a transaction-by-transaction basis. For example, a Bulletin Board System (BBS) service allows users to read and download messages, and allows users to post new messages for review by other users. For performance reasons, it is desirable to have each replicated service application maintain a duplicate copy of such service content data on its respective application server. This enables each application server to provide read-only access to the service content data without accessing other servers. For example, when a user requests to view a BBS message, the BBS server to which the user is assigned can provide access to the message without having to access an external database.
To achieve this objective, some mechanism is needed to ensure that all replicated service applications (each running on a respective server) update their locally-stored copies of the service content data in a consistent manner, so that all application servers of the service group contain like content. Stated differently, a mechanism is needed to "synchronize" the independently-running service applications, so that all service applications provide access to identical data, and so that the on-line service appears the same to all end users.
One replication technique which is commonly used in the art of distributed databases is known as the "two-phase commit" protocol. Under this protocol, updates to replicated data sets on different servers are made in two phases. During the first phase, a "coordinator" informs the other servers of the update, and each server returns a message to the coordinator indicating whether or not that server can perform the update. During the second phase, the coordinator decides, based on the responses of the other servers (plus its own vote), whether or not the update can be made, and then instructs the other servers of the decision. If all of the servers have indicated that the update can be performed, the coordinator instructs the servers to perform the update. Otherwise, the coordinator instructs the other servers to abort the update. The two-phase commit protocol is further described in George Coulouris et al, Distributed Systems, Concepts and Design, Second Ed., (Addison Wesley publishing Co., 1994), pp. 414-421.
One problem with the two-phase commit protocol is that it is poorly suited for an on-line services network that handles large numbers of concurrent user connections. For on-line services that receive and process large numbers (hundreds of thousands to millions) of update requests per day, the two-phase commit method would create a bottleneck, degrading the quality of the on-line service from the perspective of end users. What is needed, therefore, is a mechanism for efficiently processing update requests made to replicated, transaction-based services.
What is also needed is an efficient mechanism for bringing the content of an application server up-to-date with that of other application servers, so that new application servers can be added to service groups (when, for example, application servers are reallocated to different service groups), and so that existing application servers can efficiently be taken off-line for maintenance.