1. Field of the Invention
The present invention relates to an improved data processing system and, in particular, to a data processing system for database maintenance.
2. Description of Related Art
Enterprise messaging requirements are evolving beyond traditional store-and-forward e-mail to include the integration of groupware/workflow applications and tight coupling with the xe2x80x9cbrowsingxe2x80x9d model of corporate intranets. Another key trend, made possible by the proliferation of Internet standards, is ubiquitous access to information across standards-based networks and data stores. At the same time, the messaging infrastructure must be extended beyond the enterprise to business partners, customers, and suppliers to provide a return on investment in electronic messaging technologies.
As a result of these new imperatives, enterprise and inter-enterprise message traffic is expanding quickly beyond the limitations of disparate legacy systems, loosely coupled mail and intranet systems, and the multitude of gateways connecting them. Indeed, companies are faced with the task of consolidating heterogeneous e-mail systems, delivering access to new information sources, and building a robust messaging infrastructure that meets current and expected enterprise requirements.
A known enterprise messaging solution is Lotus Notes(copyright), which is a platform-independent, distributed client-server architecture. Domino(copyright) servers and Lotus Notes(copyright) clients together provide a reliable and flexible electronic mail system that xe2x80x9cpushesxe2x80x9d mail to recipients using industry standard message routing protocols, and that facilitates a xe2x80x9cpullxe2x80x9d paradigm in which users have the option to embed a link to an object in a message. The object can reside in a Domino(copyright) database, an HTTP-based xe2x80x9cintranetxe2x80x9d data store, a page on the World Wide Web, or even a Windows(copyright) OLE link. Lotus Notes(copyright) also tightly integrates groupware applications.
Groupware connects users across time and geography, leveraging the power of network computing. Ironically, networks present one of the biggest challenges to groupware implementation. Connections are sometimes unavailable or inadequate for demanding tasks. While this can be due to failure, there are many other reasons including, without limitation, mobile users, remote offices, and high transmission costs. Groupware has to keep users working together through all these scenarios. The technology that makes this possible is so-called replication. A replication mechanism puts information wherever it is needed and synchronized changes between replicas.
Thus, for example, using Lotus Domino(copyright) replication services, an organization wishing to deploy a Web application to multiple locations may set up servers in each location. As data is changed in each location, the architecture ensures that databases are synchronized through replication. As another example, a salesperson who pays frequent visits to customer sites also needs to stay connected to the databases and information at her home office. When she leaves the office with a laptop computer, she makes a copy or replica of the lead tracking and customer service databases that she needs. While out of the office, however, other account managers may make changes to the server database at the same time that she is making her own changes. The salesperson can re-synchronize the two by replicating again over a telephone connection. All updates, additions and deletions that were made to the server after she left the office are now replicated to the laptop database, and all updates, additions and deletions she made on the laptop database are replicated back to the server database. The replication process also detects update conflicts and flags them for the salesperson and other users to reconcile.
There are several known replication techniques that allow workgroup users to connect to a local server and at the same time keep information synchronized across geographically dispersed sites. Documents in the replicated database are composed of fields. When two servers desire to synchronize their respective version of a given document, the most recent field entry for each field of the document is often used for replication purposes. If timely replication is desired, updates to one replica are propagated to other replicas as soon as possible. For example, in Lotus Notes(copyright) Clustering Release 4.5, replication is effected by having every server convey an update to every other server whenever a local update occurs. This approach suffers from the drawback of not being readily scalable. Another approach is xe2x80x9cscheduled replicationxe2x80x9d, wherein a pair of servers periodically wake up and compare data sets. This requires every data set on both servers to be compared and is strictly a two way operation. Scheduled replication is costly and cannot be done in a timely fashion, and it also creates a significant amount of undesirable network traffic.
Other known techniques (e.g., Microsoft Exchange(copyright)) provides a simple, first generation messaging-based replication scheme. This technique relies on store-and-forward mail to push changes from one server to other defined replicas on other servers. There is no comparison operation, however, to guarantee that replicas remain synchronized. Such a system significantly increases administrative and end-user burden. Moreover, if a user changes even a single property or field of a document, the entire document must be replicated rather than just the property or field. Netscape Suitespot(copyright) uses proxy servers for locally caching Web pages, which reduces network bandwidth requirements. This technique, however, is merely duplicationxe2x80x94copying files from a distant place to a closer placexe2x80x94and there is no relationship between the copies. It is not a true replication mechanism.
There remains a need to provide enhanced replication schemes that address the deficiencies in the prior art.
It is a primary object of this invention to replicate data in a timely manner across a large number of nodes.
It is another primary object of this invention to provide replication enhancements in a distributed system that significantly reduce network traffic.
It is still another primary object of this invention to provide high performance, near-realtime replication in a geographically-dispersed network topology.
Still another primary object is to provide a simple replication mechanism that is highly scalable.
A particular object of this invention is to configure a replication mechanism within a hub and spoke network architecture.
Still another particular object is to enable sliding window acknowledgment through the hub on a broadcast to nodes in the network architecture.
Yet another object of the present invention is to enable spokes to issue periodic acknowledgments to the central hub, and for the central hub to issue periodic acknowledgments to the originating spokes, wherein such acknowledgments effectively indicate the vitality of the nodes within the system as well as any need for packet retransmission.
Still another object of this invention is to provide a hub recovery mechanism to enable a set of server nodes to transfer operation from a failed hub to a new central hub. Also, the invention provides a spoke failure mechanism for isolating a failed hub from a group of destination nodes that are targeted to receive update(s), and for selectively re-admitting the failed hub back into the destination group.
A further object of this invention is to implement a multilevel replication mechanism, for example, wherein first level hubs participate as second level spokes in a recursible architecture extendible to any depth desired.
Still another object of this invention is to enable a plurality of updates to be batched, collected and distributed together by the replication mechanism.
Another more general object is to synchronize multiple database replicas in a distributed computer environment. These databases, for example, may be local replicas of databases on a large number of servers, registry information servers, domain name servers, LDAP directory servers, public key security servers, or the like.
These and other objects are provided in a method for replicating data in a distributed system comprising a plurality of originating nodes associated with a central hub. Each of the plurality of originating nodes sends updates and associated origination sequence numbers to the central hub. A given update is directed to a distribution group comprising a set or subset of the originating nodes and typically comprises the changes in a given data set supported in a database on each such originating node. According to the method, the central hub receives, packages and sends the updates with associated distribution sequence numbers to the plurality of originating nodes. In the central hub, acknowledgments sent by originating nodes are then tracked. Each acknowledgment preferably identifies a last in-sequence distribution sequence number processed by a respective originating node. The central hub then periodically sends a message to each originating node. The message includes information identifying a highest origination sequence number acknowledged by originating nodes (comprising the given distribution group) and the highest origination sequence number associated with an update received at the central hub from the originating node.
Thus, in the inventive scheme, the originating node applies its origination sequence number to a given update and the central hub applies the hub distribution sequence number to its broadcast of that update. The periodic acknowledgment by the central hub triggers retransmission of dropped packets from the nodes, and the periodic acknowledgment by the nodes triggers retransmission of dropped packets from the hub. The periodic acknowledgments also serve as a xe2x80x9cheartbeatxe2x80x9d to indicate the vitality of the nodes within the system.
According to the replication scheme, updates and associated distribution sequence numbers are only sent (by the central hub) to originating nodes having no more than a permitted quantity of unacknowledged updates. The central hub also rebroadcasts updates and associated distribution sequence numbers to originating nodes whose acknowledgments indicate lack of receipt of updates from the central hub. Rebroadcasting begins with an update associated with a lowest unacknowledged distribution sequence number. Upon failure of a given originating node, the node is isolated from other nodes in the given distribution group. If the failed node later resurfaces, a node failure recovery scheme is implemented. In particular, the failed originating node is associated with another originating node (a xe2x80x9cbuddyxe2x80x9d node) in the given distribution group. The failed node is then provided with a current copy of a data set from the buddy node. The failed node rejoins the group and becomes eligible for updates at the start of the copy from the buddy node, although actual transmission is likely to be delayed until after the node tells the hub its copy has completed.
The replication mechanism also implements a hub recovery mechanism. In particular, upon failure of the central hub, a given subset or quorum of the originating nodes confer to designate a substitute central hub. Thereafter, hub responsibilities are transferred to the substitute central hub.
According to a preferred embodiment, a given originating node sends a plurality of updates to the central hub in a package. Likewise, the central hub sends a plurality of updates to a given originating node in a package. Thus, the mechanism processes updates in a batched manner.
The foregoing has outlined some of the more pertinent objects and features of the present invention. These objects and features should be construed to be merely illustrative of some of the more prominent features and applications of the invention. Many other beneficial results can be attained by applying the disclosed invention in a different manner or modifying the invention as will be described. Accordingly, other objects and a fuller understanding of the invention may be had by referring to the following Detailed Description of the preferred embodiment.