The invention relates to scalable and fault-tolerant computer systems.
The need for fast, reliable and secure access to vast amounts of shared data worldwide has been driving the growth of multiprocessing paradigm in which applications, data storage, processing power, and other resources are distributed among a pool of processors. A number of architectures have been developed over time to address the requirements of multiprocessing. Depending on the resources that the processors share, multiprocessing architectures may be classified into three classes: share everything architecture, shared nothing architecture, and shared something architecture.
One example of a shared-everything architecture is a Symmetric Multiprocessing (SMP) architecture. An SMP system is capable of scaling multi-process or multi-threaded loads so that application code can run on any processor in the system without software changes. Adding new throughput to the SMP system may be as simple as adding a new CPU board, provided the operating system can take advantage of it. Implementations of SMP generally provide a plurality of CPU and memory boards which communicate with each other and with input/output boards over a wide and fast bus.
The SMP approach demands close communications between processors. The maintenance of consistency between processors is also non-trivial. The overhead and complexity of the consistency protocols may adversely affect the scalability of the SMP architecture. Further, processors in the SMP architecture typically share one copy of the operating system. In addition to limiting the scalability of the SMP architecture, the sharing of the copy of the operating system creates a potential for many single points of failure occurring when many resources are shared.
One commonly used technique to provide fault-tolerance (fail-over) depends on a client application to recognize when a server is unavailable to satisfy a request, and if so, either to locate another server or to deny the request altogether. For example, object techniques such as CORBA or Microsoft""s Component Object Model (COM) and Distributed Component Object Model (DCOM) may be used to implement this technique. These object architectures require a one-to-one relationship between the client and the server. If the server fails for any reason, the client needs to handle the failure either by finding another server that can perform the same service or by handling an error condition. These approaches require complex and time-consuming communication set-ups to provide sufficient fault tolerance for applications.
Another system supporting fail-overs for server processes, available from Microsoft Corp. of Redmond, Washington, is called Microsoft Cluster Server (MSCS). The MSCS system uses a hot-standby technique in which a primary server and a standby server send xe2x80x9ckeep alivexe2x80x9d messages back and forth so that the standby server is activated if it cannot contact the primary server. This is a time consuming fail-over process. Further, the system is inefficient since computer resources of the standby server are not used until a failure occurs.
A computer system includes a plurality of interdependent processors. Each interdependent processor executes an independent operating system image without sharing file system state information, and each interdependent processor further has a network access card with a first network connection and a second network connection. The computer system has a first active backplane coupled to each first network connection of each processor; a second active backplane coupled to each second network connection of each processor, the second active backplane operating in lieu of the first active backplane in case of a fail-over; and one or more peripherals connected to each of the first and second active backplanes and responsive to data requests transmitted over the first and second active backplanes.
Implementations of the invention include the following. Each active back plane may be a switch. The switch may be an Ethernet switch. One or more networked data storage devices may be connected to the first and the second active backplanes. Further, one or more servers may be connected to the first or the second active backplane. Each director may be connected to each of the first and second active backplanes. Each director may also be connected to a router. A peripheral device with an address may be coupled to the first or second active backplane. The address may be an Internet Protocol (IP) address. Further, the peripheral device may respond to the IP address when accessed from the first or second active backplane. The address may also be a Media Access Protocol (MAC) address.
In a second aspect, a method for operating a computer system, includes: executing an independent operating system image without sharing file system state information by each processor in a group of interdependent processors, each interdependent processor having a network access card with a first network connection and a second network connection; and transferring data on either a first active backplane coupled to each first network connection of each processor or a second active backplane coupled to each second network connection of each processor, the second active backplane operating in lieu of the first active backplane in case of a fail-over.
Implementations of the method include the following. The transferring step includes routing data over each active backplane using a switch, which may be an Ethernet switch. Data may be accessed from one or more networked data storage devices connected to the first and the second active backplanes. Requests may be communicated from one or more servers over the first or the second active backplane. Each director may be connected to each of the first and second active backplanes to provide load-balancing. Each director may also be connected to a router. A peripheral device connected to the first or second active backplane may be accessed at a predetermined address. The address may be a predetermined Internet Protocol (IP) address, and the peripheral device may be accessed at the predetermined IP address from the first or second active backplane. The address may also be a predetermined Media Access Protocol (MAC) address.
Advantages of the invention include the following. The invention provides scalability and fault tolerance. The invention allows many servers to perform the same task in an active/active scalable manner. The invention also supports load balancing among a pool of like servers. By providing a client process with access to a pool of like servers which are load balanced, the invention keeps the response time for each request to a minimum. Thus, the invention supports high data availability, fast access to shared data, and low administrative costs through data consolidation. Additionally, the invention may be built using standard off-the-shelf components to reduce overall system cost.