A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document of the patent disclosure, as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever.
The following U.S. Patent Applications are cross-referenced and incorporated herein by reference:
U.S. patent application No. 60/305,986 entitled xe2x80x9cDATA REPLICATION PROTOCOL,xe2x80x9d by Dean Bernard Jacobs, Reto Kramer, and Ananthan Bala Srinivasan, filed Jul. 16, 2001.
U.S. patent application entitled xe2x80x9cEXACTLY ONCE JMS COMMUNICATIONxe2x80x9d by Dean Bernard Jacobs and Eric Halpern, filed concurrently herewith.
The present invention is related to technology for distributing objects among servers in a network cluster.
In distributed computer systems, it is often the case that several servers and/or networking nodes need to work together. These servers and nodes have to be coordinated, as there is typically networking information that needs to be shared among the machines in order to allow them to function as a single entity. Typical approaches to machine coordination can be very expensive in terms of resources and efficiency.
In general some synchronization required for the nodes to agree, as there may be several messages passing between the nodes. This requirement for synchronization may, however, be undesirable in a clustered networking environment. Many clustered environments simply avoid imposing any such synchronization requirement. There are applications, however, where agreement is necessary.
In one case where agreement is needed, a device can exist to which a cluster may want exclusive access. One such device is a transaction log on a file system. Whenever a transaction is in progress, there are certain objects that need to be saved in a persistent way, such that if a failure occurs those persistently-saved objects can be recovered.
For these objects that need to be saved in one place, there is typically a transaction monitor that runs on each server in that cluster or domain, which then uses a local file system to access the object. Each server can have its own transaction manager such that there is little to no problem with persistence. There is then also no need for coordination, as each server has its own transaction manager.
For example, there can be a cluster including three servers, each server having a transaction manager. One of those servers can experience a failure or other problem causing the server to be unavailable to the cluster. Because the failed server is the only server having access to a particular transaction log, none of the transactions in that particular log can be recovered until the server is again available to the cluster. Recovery of the log can be difficult or at least inefficient, as a problem with the server can take a significant amount of time to fix. Significant server problems can include such occurrences as the shorting out of a motherboard on the server or a power supply being burnt out.
The present invention includes a system for managing objects, such as can be stored in servers on a network or in a cluster. The system includes a data source, application, or service, such as a file system or Java Message Service component, which can be located inside or outside of a cluster. The system can include several servers in communication with the file system or application, such as through a high-speed network connection.
The system includes a lead server, such as can be agreed upon by the other servers. The lead server can be contained in a hardware cluster or in a software cluster. The system can include an algorithm for selecting a lead server from among the servers, such as an algorithm built into a hardware cluster machine. The lead server in turn will contain a distributed consensus algorithm for selecting a host server, such as a Paxos algorithm. The algorithm used for selecting the lead server can be different from, or the same as, the algorithm used to select the host server.
The host server can contain a copy of the item or object, such as can be stored in local cache. The host server can provide local copy access to any server on the network or in a cluster. The host server can also provide the only access point to an object stored in a file system, or the only access point to an application or service. Any change made to an item cached, hosted, or owned by the host server can also be updated in the file system, application, or service.
If the host server becomes unable to host the object, a new host can be chosen using a distributed consensus algorithm. The new host can then pull the necessary data for the object from the file system or service. The other servers in the cluster can be notified that a new server is hosting the object. The servers can be notified by any appropriate means, such as by point-to-point connections or by multicasting.