The present application Ser. No. 09/184,245 is related to a co-pending application entitled xe2x80x9cDETERMINISTIC AND PREEMPTIVE THREAD SCHEDULING AND ITS USE IN DEBUGGING MULTITHREADED APPLICATIONSxe2x80x9d, filed Nov. 2, 1998 with the application assigned to the assignee of the present application.
1. Field of the Invention
The present invention generally relates to computer and information handling systems, and more particularly to replicated servers deployed in a distributed system. Still more particularly, the present invention relates to a method and system for enforcing consistency among replicated servers in a distributed system through the use of multicast and deterministic thread scheduling.
2. Description of the Related Art
Computer networks allow users of data-processing systems to retrieve vast amounts of electronic information heretofore unavailable in an electronic medium. Computer networks are increasingly displacing more conventional means of information transmission, such as newspapers, magazines, and television. A computer network connects a set of machines and allows them to communicate with one another. Typical networked systems utilized widely today follow the client/server architecture. In network computing, a client is a process (roughly a program or task) that requests a service provided by another program, the server. The client process may utilize the requested service without having to xe2x80x9cknowxe2x80x9d the working details of the server""s operation or the requested service itself.
It is common nowadays to use remote procedure call (RPC) in implementing servers in a network-computing environment. Furthermore, remote procedure calls are often referred to as remote method invocation systems when the client and servers use an object-oriented paradigm for software implementation and communications. We shall use the terms remote procedure calls and remote method invocations interchangeably. In this model of execution, clients formulate their requests in the form of xe2x80x9cprocedure callsxe2x80x9d or xe2x80x9cmethod invocationsxe2x80x9d that execute on the server machine. The server implements the required procedure calls and methods. During normal operation, it waits to receive requests from its clients across the network. When the network subsystem delivers such a request to the server, the latter creates a xe2x80x9cthreadxe2x80x9d to execute the client""s request and generates an appropriate reply. A thread is a lightweight execution unit that lives in the server process""s address space and shares its resources with potentially other threads that are executing other, possibly independent client requests. In this model, the thread starts executing at the procedure call specified by the client""s request, and executes until the procedure call returns. The server then sends the value produced by the procedure call back to the client and deallocates the thread. This RPC model of execution has become the centerpiece of distributed computing standards such as the Distributed Computing Environment (DCE), the Common Object Request Broker Architecture (CORBA), and DCOM (Distributed Common Object Model).
Server architectures are often configured to achieve reliability and high availability utilizing replication. In such systems, several processors or machines may be utilized to provide a service, with each machine replicating the service""s state. Such machines are referred to as xe2x80x9cserver replicasxe2x80x9d or simply xe2x80x9creplicasxe2x80x9d. A client may communicate with a subset of the server replicas, where such a subset may include all, some, or only one of the available replicas. A client may select the subset randomly or via pre-defined selection criteria. It is thus necessary that all server replicas maintain identical states in order to ensure a consistent view of the information manipulated by the service, as perceived by the same client or by different clients.
Each replica has its own private implementation of the remote procedures that constitute the service. Execution of a client""s request proceeds independently among the different servers and it is important to ensure that the states of the replicas remain consistent despite this independent form of execution. If a server replica fails, the remaining server replicas continue to operate, thereby ensuring uninterrupted service for the clients.
A problem faced by designers in implementing replicated services is to ensure that replicas maintain identical states that reflect client transactions with the service. For example, two different clients may issue two remote procedure calls to update the same record in a database maintained by a replicated service. If the two procedure calls are processed in different orders by two or more replicas, the values of the replicated record may become inconsistent.
There are two properties of RPC systems that may lead two different client requests to execute in two different orders at different server replicas:
First, the network may deliver requests to the server replicas in different orders. For example, if clients A and B send RPC""s R and P to server replicas C and D, the thread executing RPC R at server C may start before the thread executing RPC P, if the network delivers R before P at C. Similarly, the thread executing RPC P at server D may start before the thread executing RPC R, if the network delivers P before R at D. Thus, if the network does not deliver the clients"" requests in the same order at server replicas C and D, they will execute the requests in different orders and may become inconsistent.
Secondly, the thread scheduler inside each server may schedule the threads that are executing clients"" requests in different orders. Conventional thread schedulers use timers to enable scheduling decisions, and since timers cannot be precise across different machines for pedagogical reasons, thread scheduling decisions will not be identical among different server replicas. Thus, even if the network delivers client requests in the same order among all replicas, the thread scheduling may not necessarily obey that order and the executions of the client requests on two different server replicas may thus be different.
All existing distributed computing standards are susceptible to the problem described above. In the past, ordered multicast protocols have attempted to address this problem. They ensure that all server replicas receive the same messages from the network in the same order. Then, execution within each server replica is serialized according to the order specified by the network, such that a request cannot start execution before the previous one finishes. This solution is not satisfactory because it eliminates the benefits of concurrency available within each server and reduces performance drastically. As a result, the resulting performance loss due to replication is large. Furthermore, in the prior art, there was never a coupling between the order specified by the multicast protocol and the execution order of the threads that execute the request. Based on the foregoing, it can be appreciated that a need exists for an improved method and system for implementing an ordering protocol in combination with a thread scheduling mechanism that ensures all replicas of server receive and execute clients"" requests in the same order. The subject invention herein solves all of these problems in a new and unique manner that has not been part of the art previously.
It is therefore an object of the invention to provide an improved method and system for maintaining the consistency among replicated servers in computer networks.
It is another object of the invention to provide an improved method and system for maintaining the consistency among replicated servers in computer networks where clients and servers communicate via remote procedure calls.
It is yet another object of the invention to provide an improved method and system for maintaining the consistency among replicated servers in computer networks where clients and servers communicate via remote procedure calls, and where servers use multiple threads to execute multiple client requests in parallel and improve performance.
The above and other objects are achieved as is now described. A method and system is disclosed for maintaining consistency among the replicas of a server in a computer network, where clients and servers use remote procedure calls (RPC""s) for communications, and where servers use multiple threads to execute client requests. The computer network is assumed to connect one or more clients to a replicated server. Each server replica among the group of servers replicates a particular network service to ensure that the particular network service remains uninterrupted in the event of a server failure. A client""s request is formulated in a remote procedure call according to established art. Each server replica implements the desired service in the form of procedure calls or object methods, as is common in the art.
We assume the existence of an ordering multicast protocol that delivers clients"" requests (the RPC""s) reliably in the same order to all server replicas. Many such protocols have been proposed and implemented, and the current invention can be easily adapted to work with any protocol as follows. The multicast protocol delivers client requests to server replicas in rounds. During a round, each server replica receives the same set of client requests with an associated execution order that has been decided by the multicast protocol. The order by which the multicast protocol delivers the requests will be enforced among all service replicas. As common in the art, a multicast round can be empty, delivering no requests. Such empty rounds have been used traditionally to support failure detection and ensure execution progress.
In association with the multicast protocol a deterministic and pre-emptive thread scheduler based on instruction counters is utilized. The deterministic scheduler subdivides execution streams into instruction slices such that the number of instructions within each slice is pre-determined. All replicas switch threads according to a known algorithm (e.g. round robin) wherein the scheduling occurs at the end of each instruction slice. That is, every thread runs until the number of instructions within a slice expires or the thread voluntarily blocks. Therefore, all scheduling decisions are identical everywhere, eliminating nondeterminism due to time-based scheduling of traditional thread schedulers.
Furthermore, the deterministic scheduler incorporates new threads into the ready queue only during what is termed herein the admission control window (ACW). According to this scheme, the scheduler admits new threads only every m instruction slices, where m is a tunable implementation parameter that regulates the frequency of ACW""s. It can be seen that thread executions among all service replicas will remain identical. Threads are admitted during the same ACW""s at all replicas, and are scheduled to execute the same number of instructions between context switches. The result is an execution that will have the same output at all replicas.
The occurrence of each ACW is coupled with the arrival of a multicast round. That is, there is a one-to-one correspondence between multicast rounds and the ACW""s. The new requests that arrive in a multicast round are admitted into the scheduler""s ready queue. If the round is empty, the scheduler continues scheduling the existing threads.
The above as well as additional objects, features, and advantages of the present invention will become apparent in the following detailed written description.